Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using script fallback for time zone names and maybe others #5901

Open
sffc opened this issue Dec 14, 2024 · 6 comments · May be fixed by #5966
Open

Consider using script fallback for time zone names and maybe others #5901

sffc opened this issue Dec 14, 2024 · 6 comments · May be fixed by #5966
Assignees
Labels
2.0-breaking Changes that are breaking API changes A-data Area: Data coverage or quality C-time-zone Component: Time Zones discuss Discuss at a future ICU4X-SC meeting S-small Size: One afternoon (small bug fix or enhancement)

Comments

@sffc
Copy link
Member

sffc commented Dec 14, 2024

The data size optimization landed in #5751 reduces data size in English and other Latin-script languages, but it has minimal impact on other languages.

To reduce this bias, should we consider having a und-Script locale for time zone names. It can contain the names in the language according to likely subtags. This may help languages that share a script, such as sr/uk/ru or zh/zh-Hant/yue. Spot-checking, there are definitely display name overlaps in those languages.

@robertbastian

@sffc sffc added A-data Area: Data coverage or quality C-time-zone Component: Time Zones labels Dec 14, 2024
@sffc sffc added this to the ICU4X 2.0 Stretch ⟨P2⟩ milestone Dec 14, 2024
@sffc sffc added the discuss Discuss at a future ICU4X-SC meeting label Dec 14, 2024
@robertbastian
Copy link
Member

There might be political issues around region display names that need the region to be resolved (and time zone names are often region names).

@sffc sffc removed the discuss Discuss at a future ICU4X-SC meeting label Jan 7, 2025
@Manishearth Manishearth added discuss Discuss at a future ICU4X-SC meeting S-small Size: One afternoon (small bug fix or enhancement) labels Jan 7, 2025
@sffc sffc added S-medium Size: Less than a week (larger bug fix or enhancement) and removed discuss Discuss at a future ICU4X-SC meeting S-small Size: One afternoon (small bug fix or enhancement) labels Jan 7, 2025
@Manishearth Manishearth added discuss Discuss at a future ICU4X-SC meeting S-small Size: One afternoon (small bug fix or enhancement) 2.0-breaking Changes that are breaking API changes and removed S-medium Size: Less than a week (larger bug fix or enhancement) labels Jan 7, 2025
@robertbastian
Copy link
Member

It looks like no script-sharing language overlaps on all names.

@robertbastian
Copy link
Member

Ah this makes sense of course, the payloads would have already been deduplicated before. I'm not sure I understand you proposition. We do fallback from non-location names to location names in the same language. What fallback exactly are you suggesting?

@robertbastian robertbastian assigned sffc and unassigned robertbastian Jan 8, 2025
@sffc
Copy link
Member Author

sffc commented Jan 8, 2025

Right now we load two payloads, one for the language and one for und. I was suggesting that we instead load two payloads, one for the language and one for the script: und-Latn for en, fr, vi, ..., und-Hani for zh, yue, zh-Hant, ..., und-Cyrl for sr, ru, uk, ...

@sffc sffc assigned robertbastian and unassigned sffc Jan 8, 2025
@sffc
Copy link
Member Author

sffc commented Jan 8, 2025

PR #5960 is not quite what I had in mind. What I had in mind was that, for example:

  1. und-Cyrl contains everything from ru, chosen because ru is the likely subtag language for und-Cyrl
  2. ru is empty and falls back to und-Cyrl
  3. uk gets smaller and contains only those names that differ from und-Cyrl == ru
  4. sr gets smaller and contains only those names that differ from und-Cyrl == ru

Just like how we made und contain English data and then en variants are smaller.

@robertbastian
Copy link
Member

Ah you mean the deduplication from #5759, not #5751

@robertbastian robertbastian linked a pull request Jan 9, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0-breaking Changes that are breaking API changes A-data Area: Data coverage or quality C-time-zone Component: Time Zones discuss Discuss at a future ICU4X-SC meeting S-small Size: One afternoon (small bug fix or enhancement)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants