Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behaviour of begin/end and while patterns do not match TextMate #241

Open
DanTup opened this issue Oct 3, 2024 · 3 comments
Open

Behaviour of begin/end and while patterns do not match TextMate #241

DanTup opened this issue Oct 3, 2024 · 3 comments

Comments

@DanTup
Copy link

DanTup commented Oct 3, 2024

I originally raised this as microsoft/vscode#189940 but it seems like it should be moved here.

The original report is as follows:


This was reported at dart-lang/dart-syntax-highlight#11 (comment). Dart highlighting on GitHub doesn't handle unterminated triple-backticks as expected. VS Code does handle it as expected.

However, while debugging this, I've become less certain that GitHub is wrong, and feel like VS Code might be.

Here's a trimmed down version of the grammar that shows the problem. It defines triple-slash comments, and supports triple-backtick code blocks inside:

It renders like this:

image

It the triple backticks are unclosed, it looks reasonable:

image

However, it's not clear why the variable.other.source.dart scope was exited, because the "end" condition was never found. On GitHub, this does not happen and the rest of the document is consumed (note the first void here is red, but the second one is not because the variable context eats the rest of the document):

image

I can't find anything in the spec for textmate grammars to explain VS Code's behaviour. The most information I've found on it is here:

https://macromates.com/manual/en/language_grammars

The other type of match is the one used by the second rule (lines 9-17). Here two regular expressions are given using the begin and end keys. [...] If there is no match for the end pattern, the end of the document is used.

https://www.apeth.com/nonblog/stories/textmatebundle.html

With begin/end, if the end pattern is not found, the overall match does not fail: rather, once the begin pattern is matched, the overall match runs to the end pattern or to the end of the document, whichever comes first.

While VS Code's behaviour is convenient for me (because I'm not sure how to handle these unclosed triple-backticks if it behaved like GitHub), it doesn't seem correct, and it's more inconvenient if VS Code and GitHub disagree on what the behaviour should be because it makes it more difficult to author a grammar.

@DanTup
Copy link
Author

DanTup commented Oct 3, 2024

There was some back and forth about whether VS Code or GitHub was correct here and I filed github-linguist/linguist#7015 thinking it was a GitHub issue. However, @RedCMD did some more digging and tested with TextMate and confirms it behaves the same as GitHub, and therefore VS Code's behaviour is incorrect:

microsoft/vscode#189940 (comment)

@RedCMD
Copy link

RedCMD commented Oct 5, 2024

There are two differences between VSCode TextMate and TextMate2.0

while while is being checked, a \G anchor is placed at the beginning of the next line. VSCode does not do this currently ❌
this should be an easy fix and prob fix a few bug reports as well

VSCode's while is very strict in that it doesn't let begin/end escape, which in my opinion is very good for embedded languages
However TextMate2.0 allows while to be pushed out
which is I think is terrible
as in the example above
the middle /// would need to be handled by the embedded rules instead of the parent grammar
I'm not sure if it should be fixed, as Markdown heavily relies on VSCode's behaviour

@DanTup
Copy link
Author

DanTup commented Oct 5, 2024

I'm not sure if it should be fixed, as Markdown heavily relies on VSCode's behaviour

While I agree that VS Code's behaviour seems better, I don't think diverging from TextMate and claiming to be a TextMate grammar is great for extension authors or users. It'll either result in bugs and inconsistencies between editors, or require grammar writers to spend time testing grammars against each editor.

But if VS Code does choose to knowingly diverge, these differences should be clearly documented IMO so that grammar authors trying to go in either direction (use a grammar written against VS Code elsewhere, or bring a grammar from elsewhere to VS Code) have some reference of the things to look out for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants