-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider using end pattern as bail out for embedded languages #207
Comments
To add some context, matter123 and myself made an npm package textmate-bailout several years ago that does this kind of thing, albeit in a very expensive way (which is the only way as of current). I still use it in multiple syntax highlighters. We quite literally duplicate entire grammars, just to add our end pattern in front of all the embedded grammars' end-patterns to make sure they bail-out. It would be an improvement in quality to use the npm package for markdown, html, latex, etc. They would become the largest syntax files by far though as they'd literally contain copies of every language, likely multiple copies. If instead there was a built in feature for this bailout step, it would cut the C++ grammar file down by more than 50%. It would be a non-standard textmate feature though. |
I share the sentiment, I find that this is one of the fundamental flaws of TextMate. When including another grammar, there should be a straight-forward way to specify a bail out end pattern. The fundamental problem here is that we want to be good open source citizens and we should not deviate from TM's implementation to avoid fracturing the TM grammar comunity. However, if we do decide to add this, we should try to add it in a minimal invasive way, perhaps as an optional property when including another grammar. cc @hediet |
I support your thoughts exactly, this is a "between a rock and a hard place" problem. I think there is a way forward, but I could use some help. As far as I know, Textmate isn't a specification. There's no central comprehensive documentation, there's a nearly complete lack of standards when it comes to textmate-scoping names, and there's no working group for resolving design flaws or specifying "correct" behavior. I documented some of the obscure features on my own, but lack anywhere meaninful to put it. I've also had many conversations with grammar maintainers and old Atom contributors about standardizing textmate scopes, but we also lacked a central place to have a conversation or publish any agreed-upon result. A central spec-repo (similar to WebAssembly's) could convert a realatively small feature like bailouts from bad-fracturing-change to a helpful-new-addition that other textmate engines could agree upon and take advantage of. I can do a lot of the work towards that, bringing the feature documentation, formalizing existing pseudo-standards, contacting contributors/maintainers of major Textmate implementations, but I would need the backing/support of vscode-textmate to add/start with some form of legitmacy. |
@jeff-hykin I think such effort would be beneficial. Please note that we don't have resources at the moment to assist with that. Also note that there are subtle bugs in our implementation. I'd rather try to bring tree sitter tokenization support to VS Code than spending a significant amount of energy on improving textmate support. |
Woah! As per microsoft/vscode#50140 I thought tree-sitter support was a non-starter. If VS Code is even considering adding support, thats huge news to me. I'd be happy for all effort to be spent towards tree sitter support. |
I thought TextMate was just a spec it has its opinions about scope names. which vscode dark seems to follow exactly?
@jeff-hykin may I see that please?
I don't think injections can be used to solve this problem either |
Yeah I don't consider https://macromates.com/ "comprehensive" by any means. I don't think it even mentions
Sure! This could use updating/enhancing, but here's an example of some documention for Some of the other missing stuff is $base vs $self, injections, referencing "begin" capture groups inside the "end" pattern, using "repository" in a nested pattern, then referencing the outer repository vs inner repository, \G behavior, scope matching (like the negative operator that VS code doesn't support), and probably some other stuff I'm forgetting. |
I believe these were new features meant to be documented in Textmate 2 but Apple didn't dedicate any resources to that yet. This is all we have and will have for a long time: https://macromates.com/manual/en/language_grammars#example_grammar |
Why isn't this being done by a hoisted variable that stores the end pattern when embedding begins? |
@zm-cttae it does already |
It should be possible to filter the setter for that end pattern hoisted variable. |
Another relevant technique: push a "stack" array of patterns for end conditions. |
This started as a discussion in jlelong/vscode-latex-basics#58.
The problem is that languages that are embedded within other languages have a high potential to not work as expected. When there's a problem in the embedded grammar it can easily escape into the primary grammar.
Key comment from @jlelong copied below:
@alexr00 yes exactly. For instance, embedding the Python language typically looks like
It would be nice if the top level grammar could escape from the Python one when the end pattern is reached. I originally thought a new pattern was needed to trigger the bail out but I believe it will always be the same as the end pattern.
Currently, we provide a modified C++ grammar which includes the bail out pattern. This approach would make it useless. Embedding a grammar would much more robust.
The text was updated successfully, but these errors were encountered: