Treat private-use characters like non-printable characters for escaping #4015

RoelN · 2025-01-08T13:42:35Z

Specification
Tests
Dart Sass
Website

The following code:

$one: \31;
$pua: \e000;

div::before {
  content: unquote("\"#{$one}\"");
}

div::after {
  content: unquote("\"#{pua}\"");
}

results in the following CSS:

div::before {
  content: "\31 ";
}

div::after {
  content: "\e000";
}

Please see the space added after \31. This is being added when using the Unicode value for the number 1, but not for the Unicode value for characters in the PUA. The latter is correct, there shouldn't be a trailing space.

The text was updated successfully, but these errors were encountered:

ntkme · 2025-01-08T18:19:24Z

It seems to be a parser bug instead of a serializer bug: https://sass-lang.com/playground/#eJwzNHQoLU5VUCpOLC62Ki4pysxLV7LmUkm0UogxAAJjQ2suh5TUpNJ0BYikXk5qXnpJhoZKoiZMpLA0vyQVJADUlwTWh1tXErquJJCuZKAu3HqS0fUkg/SkAPWkAl2IS1cKuq4UTWsAkQNLNQ==?s=L1C1-L9C43

\31 is parsed to have a string length of 4, that it was parsed into a unquoted string \31 , with an extra space at the end.

The proper behavior here should be that at parse phase \31 should be deserialized as a single character string 1 and it should not get escaped at all during serialization.

ntkme · 2025-01-08T18:56:23Z

Likely root cause is at this line: https://github.com/sass/dart-sass/blob/9e6e3bfbd28fa07bd0df63cfcb85d2db9ef9b6c2/lib/src/parse/parser.dart#L472

That there is a special logic that when parsing string as identifier, already escaped ascii number 0-9 (\30 - \39) will be explicitly be parsed as \30 - \39 .

@nex3 I wonder why this special treatment is done during parsing phase. Maybe because escaped 0-9 indicating that this string must always be an identifier? Shouldn't the special escape for numbers at the beginning of an identifier to be applied at serialization based on whether it's outputting an identifier or not?

nex3 · 2025-01-09T00:06:08Z

The issue here isn't that the space is being added \31, it's that it's not being added after \e000. See Consuming an Escaped Code Point:

Otherwise, if codepoint is a non-printable code point, U+0009 CHARACTER TABULATION, U+000A LINE FEED, U+000D CARRIAGE RETURN, or U+000C FORM FEED; or if codepoint is a digit and the start flag is set:

Let code be the lowercase hexadecimal representation of codepoint, with no leading 0s.

Return "\" + code + " ".

The space itself is part of the CSS syntax for escape codes (see § 4.3.7. Consume an escaped code point). We want to include this consistently in the canonicalized format of parsed identifiers so that equivalent identifiers are always equal, while also ensuring that there isn't weird behavior like the identifier form of 1a being one character longer than the identifier form of 1x.

Edit: Sorry, I'm wrong in that the Sass spec does not mandate a space after \e000 because it's not considered a "non-printable code point". Technically, according to the spec, the canonical form of \e000 should be  (that is, the literal U+E000 PRIVATE USE AREA code point). I think that's not a desirable behavior, though; we should define private-use characters to be considered "non-printable" for this purpose. I'm going to move this to the spec repo accordingly.

ntkme · 2025-01-09T00:19:38Z

As far as I know the space is optional, and only required is the next token is a space character or hex character?

nex3 · 2025-01-09T00:26:12Z

That's right, but because the way it's canonicalized is observable—the SassScript value of the identifier \31 is a four-character unquoted string containing [\, 3, 1, ]—we want to make the canonical form as consistent as possible.

This is a downstream effect of the way we use the "unquoted string" datatype to represent not just identifiers but any CSS value we don't have a dedicated type for, including things like plain-CSS functions and so on. Where quoted strings just store their semantic values, unquoted strings store their syntactic values. This means that identifiers are stored escaped, so we need to be consistent about how they're escaped so we don't have weird issues where, for example, \@x and \64 x , and \64x are all treated as different values despite being semantically identical. This is what the Identifier Escapes proposal was all about.

ntkme · 2025-01-09T03:43:04Z

It seems to me that there is a limitation that we cannot clearly tell the difference between an unquoted identifier string or an unquoted non-identifier string, and that’s why we are just parsing it without decoding the escape sequence to force it to be outputted as is.

My question is that why cannot we always decode the escape sequence during parsing stage (as this example in question is not a Sass value from JS script but a value directly in sass source input). In other words, parse \31 as unquoted string 1 during parse stage and later during output serialization, either print a string 1 or \31 or \31 based on the context that we are writing the css output?

RoelN · 2025-01-09T14:27:32Z

Thanks for looking into this. Please note that the space seems to be removed in CSS when concatted with text: https://codepen.io/RoelN/pen/zxORvxV

But then again I'd expect the output of

$one: \31;

div::before {
  content: unquote("\"#{$one}#{$one}#{$one}\"");
}

to be

div::before {
  content: "\31\31\31 ";
}

(with our without the trailing space)

and not

div::before {
  content: "\31 \31 \31 ";
}

nex3 added the bug Something isn't working label Jan 9, 2025

nex3 self-assigned this Jan 9, 2025

nex3 removed the bug Something isn't working label Jan 9, 2025

nex3 transferred this issue from sass/dart-sass Jan 9, 2025

nex3 changed the title ~~Trailing space added for Unicode values for numbers~~ Treat private-use characters like non-printable characters for escaping Jan 9, 2025

This was referenced Jan 9, 2025

Treat private-use characters like non-printable characters for escaping sass/dart-sass#2481

Open

Add tests for treating private-use characters like non-printable characters for escaping sass/sass-spec#2043

Open

nex3 mentioned this issue Jan 9, 2025

Document private-use characters like non-printable characters for escaping sass/sass-site#1295

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treat private-use characters like non-printable characters for escaping #4015

Treat private-use characters like non-printable characters for escaping #4015

RoelN commented Jan 8, 2025 •

edited by nex3

Loading

ntkme commented Jan 8, 2025 •

edited

Loading

ntkme commented Jan 8, 2025

nex3 commented Jan 9, 2025 •

edited

Loading

ntkme commented Jan 9, 2025

nex3 commented Jan 9, 2025

ntkme commented Jan 9, 2025

RoelN commented Jan 9, 2025

Treat private-use characters like non-printable characters for escaping #4015

Treat private-use characters like non-printable characters for escaping #4015

Comments

RoelN commented Jan 8, 2025 • edited by nex3 Loading

ntkme commented Jan 8, 2025 • edited Loading

ntkme commented Jan 8, 2025

nex3 commented Jan 9, 2025 • edited Loading

ntkme commented Jan 9, 2025

nex3 commented Jan 9, 2025

ntkme commented Jan 9, 2025

RoelN commented Jan 9, 2025

RoelN commented Jan 8, 2025 •

edited by nex3

Loading

ntkme commented Jan 8, 2025 •

edited

Loading

nex3 commented Jan 9, 2025 •

edited

Loading