Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using end pattern as bail out for embedded languages #207

Open
alexr00 opened this issue May 2, 2023 · 12 comments
Open

Consider using end pattern as bail out for embedded languages #207

alexr00 opened this issue May 2, 2023 · 12 comments
Assignees

Comments

@alexr00
Copy link
Member

alexr00 commented May 2, 2023

This started as a discussion in jlelong/vscode-latex-basics#58.

The problem is that languages that are embedded within other languages have a high potential to not work as expected. When there's a problem in the embedded grammar it can easily escape into the primary grammar.

Key comment from @jlelong copied below:

@jlelong would this be something that you would define in your textmate grammar at the point that you embed the grammar?

@alexr00 yes exactly. For instance, embedding the Python language typically looks like

{
	"begin": "some_begin_pattern",
	"end": "some_end_pattern",
	"contentName": "source.python",
	"patterns": [
		{
			"include": "source.python"
		}
	]
}

It would be nice if the top level grammar could escape from the Python one when the end pattern is reached. I originally thought a new pattern was needed to trigger the bail out but I believe it will always be the same as the end pattern.

Currently, we provide a modified C++ grammar which includes the bail out pattern. This approach would make it useless. Embedding a grammar would much more robust.

@jeff-hykin
Copy link

jeff-hykin commented May 16, 2023

To add some context, matter123 and myself made an npm package textmate-bailout several years ago that does this kind of thing, albeit in a very expensive way (which is the only way as of current). I still use it in multiple syntax highlighters. We quite literally duplicate entire grammars, just to add our end pattern in front of all the embedded grammars' end-patterns to make sure they bail-out.

It would be an improvement in quality to use the npm package for markdown, html, latex, etc. They would become the largest syntax files by far though as they'd literally contain copies of every language, likely multiple copies.

If instead there was a built in feature for this bailout step, it would cut the C++ grammar file down by more than 50%. It would be a non-standard textmate feature though.

@alexdima
Copy link
Member

I share the sentiment, I find that this is one of the fundamental flaws of TextMate. When including another grammar, there should be a straight-forward way to specify a bail out end pattern.

The fundamental problem here is that we want to be good open source citizens and we should not deviate from TM's implementation to avoid fracturing the TM grammar comunity. However, if we do decide to add this, we should try to add it in a minimal invasive way, perhaps as an optional property when including another grammar.

cc @hediet

@jeff-hykin
Copy link

jeff-hykin commented May 25, 2023

I support your thoughts exactly, this is a "between a rock and a hard place" problem.

I think there is a way forward, but I could use some help.

As far as I know, Textmate isn't a specification. There's no central comprehensive documentation, there's a nearly complete lack of standards when it comes to textmate-scoping names, and there's no working group for resolving design flaws or specifying "correct" behavior. I documented some of the obscure features on my own, but lack anywhere meaninful to put it. I've also had many conversations with grammar maintainers and old Atom contributors about standardizing textmate scopes, but we also lacked a central place to have a conversation or publish any agreed-upon result.

A central spec-repo (similar to WebAssembly's) could convert a realatively small feature like bailouts from bad-fracturing-change to a helpful-new-addition that other textmate engines could agree upon and take advantage of.

I can do a lot of the work towards that, bringing the feature documentation, formalizing existing pseudo-standards, contacting contributors/maintainers of major Textmate implementations, but I would need the backing/support of vscode-textmate to add/start with some form of legitmacy.

@hediet
Copy link
Member

hediet commented May 26, 2023

@jeff-hykin I think such effort would be beneficial. Please note that we don't have resources at the moment to assist with that.

Also note that there are subtle bugs in our implementation.

I'd rather try to bring tree sitter tokenization support to VS Code than spending a significant amount of energy on improving textmate support.

@jeff-hykin
Copy link

I'd rather try to bring tree sitter tokenization support to VS Code

Woah! As per microsoft/vscode#50140 I thought tree-sitter support was a non-starter. If VS Code is even considering adding support, thats huge news to me. I'd be happy for all effort to be spent towards tree sitter support.

@RedCMD
Copy link

RedCMD commented May 27, 2023

I thought TextMate was just a spec
but seems to also be a program?
idk if this is the official documentation
v1: https://macromates.com/manual/en/
v2: https://macromates.com/textmate/manual/

it has its opinions about scope names. which vscode dark seems to follow exactly?
https://macromates.com/manual/en/language_grammars#naming-conventions

I documented some of the obscure features on my own

@jeff-hykin may I see that please?

begin - while rules are more strict than begin - end
but also much more limited
prob not useable

I don't think injections can be used to solve this problem either
as they are only good for expanding rather than halting

@jeff-hykin
Copy link

jeff-hykin commented May 27, 2023

Yeah I don't consider https://macromates.com/ "comprehensive" by any means. I don't think it even mentions while or applyEndPatternLast. The convention guide fails for all but the most basic/sterotypical scopes, and like you say it's closer to a personal opinion rather than a standard. The only textmate spec I'm aware of is behavior comparisons against the original implementation.

@jeff-hykin may I see that please?

Sure! This could use updating/enhancing, but here's an example of some documention for while https://github.com/jeff-hykin/better-cpp-syntax/blob/fe873cdfacd1df7072e7b8c95df3df369c1ffcaa/documentation/library/textmate_while.md
I think I have one somewhere for applyEndPatternLast but can't find it at the moment. I've got a few unfinished unreleased "How to use Textmate (without wanting to jump off a bridge)" drafts that would-be/are comprehensive. Without anywhere to really publish them, and with such a limited audience I haven't worked on the drafts in over a year though.

Some of the other missing stuff is $base vs $self, injections, referencing "begin" capture groups inside the "end" pattern, using "repository" in a nested pattern, then referencing the outer repository vs inner repository, \G behavior, scope matching (like the negative operator that VS code doesn't support), and probably some other stuff I'm forgetting.

@zm-cttae
Copy link

I believe these were new features meant to be documented in Textmate 2 but Apple didn't dedicate any resources to that yet.

This is all we have and will have for a long time:

https://macromates.com/manual/en/language_grammars#example_grammar

@zm-cttae
Copy link

zm-cttae commented May 28, 2023

Why isn't this being done by a hoisted variable that stores the end pattern when embedding begins?

@RedCMD
Copy link

RedCMD commented May 28, 2023

@zm-cttae it does already
the problem is that the inner embedded language can push the end pattern outwards
which is a feature, not a bug
tho not being able to disable it is the issue
but then if that setting did exist, it could cause just as many problems as it fixes
as it can be quite hard to deduce if the end pattern is inside a string etc without knowing the context around it

@zm-cttae
Copy link

zm-cttae commented May 28, 2023

It should be possible to filter the setter for that end pattern hoisted variable.
You'd limit the condition to a begin for embedded languages only.

@zm-cttae
Copy link

zm-cttae commented May 28, 2023

Another relevant technique: push a "stack" array of patterns for end conditions.
Include whether they end a language embed in the stack data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants