-
-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-22986 GL takes CM #3296
ICU-22986 GL takes CM #3296
Conversation
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
changes lgtm (=as advertised) |
The changes look good, but they also need to be propagated into the various line rule tailorings, line_cj.txt, etc. |
Thanks for pointing that out @aheninger. Done. |
The Exhaustive Tests for ICU #22 was broken between e025466 and 2e57f07 in TestMonkey https://github.com/unicode-org/icu/actions/runs/12209763924 Likely caused by this #3287 . Is this intend to fix that? |
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
A bug in the production implementation, which was not caught in ICU 76 because it is duplicated in the new monkeys (whose rules are similar to the production rules in the ways that matter here) and because of an equivalent bug in the old old monkeys: we should have looked through CM* for GL in
icu/icu4c/source/test/intltest/rbbitst.cpp
Lines 3254 to 3266 in e000c5c
The new old monkeys (after ICU-22984=#3287) are much more principled about remap rules, actually modifying the working buffer, and so caught that.
Also fix an issue with insufficiently greedy regular expressions in the rules (new rules generated from unicode-org/unicodetools#988) and with remap rules creating characters out of surrogates (as in <lead, combining mark, trail>).
Checklist