-
-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-22773 Migrate the CLDR conversion tool to Maven #3290
Conversation
ade45b5
to
a4a3f02
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This looks very good and plausible, but I didn't try to understand all of the Task replacements and configs.
Some minor feedback, nothing intelligent.
@@ -40,65 +64,132 @@ relies on a pre-processing step, and the CLDR data must come from the separate | |||
"staging" repository (i.e. https://github.com/unicode-org/cldr-staging) or be | |||
pre-processed locally into a different directory. | |||
|
|||
:point_right: **Note**: the 3 folders can also be overridden: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: do we need three ways to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately some is legacy.
We have a lot of documentation using 3 folders.
Pointing to the 3 paths where the 3 repos are cloned (icu, cldr, and cldr-staging).
There is also code relaying on that.
The cldr-code expects CLDR_DIR to be set. And not set in environment, but as Java property.
The ant did not allow one to set them from command line, but from the ant script itself, and (in some cases) from environment.
And then it used that set value(s) to set both environment AND the Java property.
Somewhere in the code you will see that I explicityl do a System.setProperty to keep thinks the way ant set them up.
So unfortunately it was tangled pretty badly.
I only kept things as they were.
There is quite a bit of refactoring that I would like to do.
And this can be one of them.
But has to also be coordinated with cldr-code, and a lot of doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is not about the three folders, but about the three ways of defining them:
- as a shell environment variable
- as a JVM property with -Dname=value
- as a command-line option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but maybe worth thinking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing code and doc assumed both shell environment variable and JVM property
The docs say "set this environment variable" everywhere.
And the defined values are used in a lot of copy / paste commands.
Then the code (in several places, including cldr-code, which is not even icu) assumed that some are set as Java properties.
And failed if not.
Ant was "magically" taking some of the environment variables and making them Java properties
For example:
<condition property="cldrDataDir" value="${env.CLDR_DATA_DIR}">
<isset property="env.CLDR_DATA_DIR"/>
</condition>
And some were "fixed" properties in the ant script:
<property name="outDir" value="${basedir}/../../../icu4c/source/data/"/>
Then the Ant convert
invoked Ant again as "convert-impl" by defining CLDR_DIR in environment, and passing other things as Java properties :-(
<exec executable="ant" searchpath="true" failonerror="true">
<env key="CLDR_DIR" value="${cldrDataDir}" />
...
<arg value="-DoutDir=${outDir}"/>
...
And finally the real "convert" task (in the "convert-impl" target) took some of them and invoked setters in the associated Java class (extending Task):
<convert cldrDir="${cldrDir}" outputDir="${outDir}" ... />
But the point is: some of these were both Java properties and environment variables at the same time.
I only preserved the existing behavior.
AND it is only for the folders.
So the Java code never looked at environment vars, only Ant did.
Java looked at (a few) properties + setters.
I had a chat with Elango (and a whiteboard) on what to do about it.
The main question is really between JVM properties vs command line options.
I was split between them, each have pros and cons.
In the end CLI won. The rest needs to be unified.
But not a hight priority, and not in this CL.
It will need to (among others) change the cldr-code library in the cldr repo.
Note that only the 3 folders have this "override behavior", not the rest.
Because they had it already.
Grep the old code (before my CL):
$ grep -r System.getProperty src/
src//test/java/org/unicode/icu/tool/cldrtoicu/SupplementalDataTest.java: Path cldrRoot = Paths.get(System.getProperty("CLDR_DIR"));
src//main/java/org/unicode/icu/tool/cldrtoicu/IcuConverterConfig.java: Optional.ofNullable(System.getProperty("ICU_DIR", null))
src//main/java/org/unicode/icu/tool/cldrtoicu/DebugWriter.java: String cldrPath = System.getProperty("CLDR_DIR", System.getenv("CLDR_DIR"));
And note this in DebugWriter.java
:
String cldrPath = System.getProperty("CLDR_DIR", System.getenv("CLDR_DIR"));
This assumption was there.
And it was too much surgery to fix in one step.
That's why I was forced to do a System.setProperty("CLDR_DIR", cldrDir);
(in Cldr2IcuCliOptions.java
)
TLDR: I am not arguing that it is good, or that I like it.
What I am trying to say is that was unclean before, it is the same now.
Deserves a cleanup, but it is not mavenization, and is not as simple as removing this override that I implemented here.
For example can't remove it from cldr-code, because CLDR itself also depends on it.
So the processes / scripts / docs in CLDR would need to be updated BEFORE removing it from this code.
...r/cldr-to-icu/src/main/java/org/unicode/icu/tool/cldrtoicu/ant/CleanOutputDirectoryTask.java
Outdated
Show resolved
Hide resolved
...r/cldr-to-icu/src/main/java/org/unicode/icu/tool/cldrtoicu/ant/CleanOutputDirectoryTask.java
Outdated
Show resolved
Hide resolved
tools/cldr/cldr-to-icu/src/main/java/org/unicode/icu/tool/cldrtoicu/ant/ConvertIcuDataTask.java
Outdated
Show resolved
Hide resolved
Thank you. Yes, it also took me quite some time to figure it out, and understand what Ant was doing. On the way I tried to replace the config with json. What I ended up doing (short version) Basically load the "task classes", load the xml fragment corresponding to each task, call The config file is 100% sub-tree(s) that existed in the Ant build file. |
tools/cldr/cldr-to-icu/src/main/java/org/unicode/icu/tool/cldrtoicu/Cldr2Icu.java
Outdated
Show resolved
Hide resolved
...r/cldr-to-icu/src/main/java/org/unicode/icu/tool/cldrtoicu/ant/CleanOutputDirectoryTask.java
Show resolved
Hide resolved
tools/cldr/cldr-to-icu/src/main/java/org/unicode/icu/tool/cldrtoicu/ant/Task.java
Show resolved
Hide resolved
sgtm -- unless Elango objects, I can offer a rubber stamp... |
I also managed to fix the tests (6f4fea1) They were cause by real changes in the data (for example the likely subtags refactoring). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
6f4fea1
to
e23801f
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
docs/processes/cldr-icu.md
Outdated
@@ -472,12 +472,15 @@ mvn install -pl main/common_tests -Dtest=MeasureUnitTest#TestGreek | |||
13b. Optionally run the tests in exhaustive mode | |||
|
|||
Optionally run before committing changes, or run to diagnose failures from | |||
running exhaustive CI tests in the PR using `/azp run CI-Exhaustive`: | |||
running exhaustive CI tests in the PR using exhaustive localy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running exhaustive CI tests locally:
docs/processes/cldr-icu.md
Outdated
Exhaustive tests in CL can ve triggered by running the "Exhaustive Tests for ICU" | ||
action from the GitHub web UI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo fix & adding link:
Exhaustive tests in CL can be triggered by running the "Exhaustive Tests for ICU"
action from the GitHub web UI:
https://unicode-org.github.io/icu/userguide/dev/ci.html#exhaustive-tests
except nicer if we can link to the .md page with a relative link, and GitHub Pages turns that into the link to the .html page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder / nice to have: add the link like this or nicer for GH Pages source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a link, hope it works.
I used CI (not CL) intentionally. Because that is a CI test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, I had missed the "CL" typo
docs/processes/cldr-icu.md
Outdated
@@ -471,14 +471,13 @@ mvn install -pl main/common_tests -Dtest=MeasureUnitTest#TestGreek | |||
|
|||
13b. Optionally run the tests in exhaustive mode | |||
|
|||
Optionally run before committing changes, or run to diagnose failures from | |||
running exhaustive CI tests in the PR using exhaustive localy: | |||
Optionally run exhaustive tests localy before committing changes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: localy --> locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DONE
8d24a58
to
3b04fe6
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
Thank you all! |
This also touches / fixes (at least partially?) other issues:
I will search and close / comment on these bugs and others once this lands.
Summary
Task
classes to regular classes, and invoked them from a Main class.cldr-code
) is now built and installed in the local Maven repo, not in a folder, and the extralib
artifact is gone (ICU-21748)ToDo / wish list
README.md
and need to be executed by hand, with copy/pasteBut it not
tools/cldr/cldr-to-icu
. Maybe move to Python?I tried to keep the changes at minimum.
So I preserved the old files with a "fake"
Task
class, withinit()
andexecute()
methods, as if it is an Ant Task.Also, kept the files in the
.../ant
package.This should (hopefully) help with the review.
I can do a cleaning after this with cleanup, refactoring (renaming and so on).
I did quite a bit of testing, but mostly manual, running the conversion tool on Linux / MacOS / Windows (which didn't work before, but works now), with all kind of options.
Checklist