Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZWJ support #114

Open
NetOpWibby opened this issue Jul 27, 2021 · 9 comments
Open

ZWJ support #114

NetOpWibby opened this issue Jul 27, 2021 · 9 comments

Comments

@NetOpWibby
Copy link

It would be nice if this library supported ZWJ emoji. Currently, it displays such emoji as several instead of just one.

@pretended
Copy link

ZWJ emojis are still considered as single emojis, so that would be really great.

@mathiasbynens
Copy link
Owner

mathiasbynens commented Aug 2, 2021

Can you please clarify this bug report? The Punycode encoding is unrelated to “displaying” or rendering characters. What makes you say ZWJ emoji are unsupported? Please provide an example.

@pretended
Copy link

pretended commented Aug 2, 2021

imagen

New emojis are displayed as combination oF emojis just like the photo shows. It would be great that we could display the new updated (combination) emojis as single emojis and not a combination (or a ZWJ sequence) of emojis.

Not really a bug report, but an enhacement.

However, maybe punycode is unrelated to displaying ZWJ emojis as single.

@NetOpWibby
Copy link
Author

With this library, xn--qq8hq8f is supposed to return 👨‍🦰 (man with red hair). Instead, it outputs 👨🦰 (man, red hair).

@mathiasbynens
Copy link
Owner

With this library, xn--qq8hq8f is supposed to return 👨‍🦰 (man with red hair).

What makes you say that?

The inverse: 👨‍🦰.com encodes to xn--1ugz855p6kd.com per https://mothereff.in/punycode#%F0%9F%91%A8%E2%80%8D%F0%9F%A6%B0.com, which seems to roundtrip correctly.

@NetOpWibby
Copy link
Author

xn--1ugz855p6kd is invalid punycode and using the IDNA2008 standard. It should be xn--qq8hq8f, using the IDNA2003 standard.

Emoji input from your phone creates the 2003 standard.

@NetOpWibby
Copy link
Author

Via idna-uts46-hx:

Unfortunately, the situation of internationalized domain names is rather complicated by the existence of multiple incompatible standards (IDNA2003 and IDNA2008, predominantly). While UTS#46 tries to bridge the incompatibility, there are four characters which cannot be so bridged: ß (the German sharp s), ς (Greek final sigma), and the ZWJ and ZWNJ characters. These are handled differently depending on the mode; in transitional mode, these strings are mapped to different ones, preserving capability with IDNA2003; in nontransitional mode, these strings are mapped to themselves, in accordance with IDNA2008.

(transitional mode is) compatible with all known browser implementations at this point.

@GuillaumeBlanchet
Copy link

IDNA2003 is deprecated nonetheless. cURL uses IDNA2008 like many other things on your computer. Unfortunately, many browsers are not up to date...

@NetOpWibby
Copy link
Author

Deprecation or not, there's clear reasons why it's still being used.

The JS Punycode converter library is a great tool for handling Unicode domain names, but it only implements the Punycode encoding of domain labels, not the full IDNA algorithm. In simple cases, a mere conversion to lowercase text before input would seem sufficient, but the real mapping for strings is far more complex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants