Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedupe location names against script-root #5966

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented Jan 9, 2025

Fixes #5901

I'm not introducing script fallback (because time zone names are not meant to fall back by script), and I'm not introducing und-Xxxx locales, as those would increase data size (due to the other fields in the struct). Instead, I store a dedupe_locale in each data struct, which contains the locale that was deduplicated against. So for sr, this would be ru, because sr -> und-Cyrl -> ru, and for ru it'd be ru as well, and ru is complete. This way, we also directly know which dedupe locale to load; if we didn't store it we'd have to load a fallbacker and step through it until we hit und (and then use the previous value).

Depends on #5967

Copy link

dpulls bot commented Jan 9, 2025

🎉 All dependencies have been resolved !

@robertbastian robertbastian force-pushed the scriptfb branch 2 times, most recently from 8aa3ffd to 1e46e4f Compare January 9, 2025 15:25
Copy link

dpulls bot commented Jan 9, 2025

🎉 All dependencies have been resolved !

@Manishearth Manishearth removed their request for review January 9, 2025 17:16
@robertbastian robertbastian removed the request for review from zbraniecki January 9, 2025 18:28
@sffc
Copy link
Member

sffc commented Jan 10, 2025

Fixes #5901

Do we not do similar deduplication in the other time zone keys as well?

@sffc
Copy link
Member

sffc commented Jan 10, 2025

So the problem with dedupe_locale is that the fallback engine has no knowledge that it needs to keep both payloads around. If you pass LocaleFamily("sr") into datagen, datagen is free to remove ru, but if we redefine the key to script fallback, it will know that it needs to retain und-Cyrl. (well, unless no-ancestor mode is used.)

@sffc
Copy link
Member

sffc commented Jan 10, 2025

Hmm. An approach that would work with our framework would be to have two keys:

  1. location_base
  2. location_override

Every locale has an entry in each key, but location_base has one unique payload per script, and location_override contains smaller payloads, which are empty if the data exactly equals the base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider using script fallback for time zone names and maybe others
2 participants