-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmake: support generator expression in yaml
module
#80007
base: main
Are you sure you want to change the base?
Conversation
8375cfd
to
d539e5a
Compare
d539e5a
to
f2aeed6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing some prototyping of this, for sure this can be beneficial.
Haven't been into all details, but tried to read enough details of the code to understand the flow.
Some overall comments.
I think we should avoid both yaml_save()
and yaml_generate()
.
Users should not care when writing the file whether it contains genex'es or not.
And it should be possible to do multiple writes, for example like this:
yaml_set(...)
yaml_set(...)
yaml_save(...)
yaml_set(... GENEX ...)
yaml_save(...)
yaml_set(...)
yaml_save(...)
I believe we can accomplish that by having each save writing the yaml file with only clean content, and then writing the json with the content and the genex'es.
If there are any genex'es, then the file(GENERATE ...)
will be used to transform the internal genex json to the file with evaluated expressions.
The pre-build custom command is then used to convert the final json to final yaml file.
As I don't think file(GENERATE ...)
allows to generate the same file multiple times, then we probably need to increment, like <name>_<n>.json
.
But this is a detail in this context.
Also remember proper DEPENDS
on the custom command.
But I like the idea and direction 👍
cmake/modules/yaml.cmake
Outdated
message(FATAL_ERROR "YAML context '${ARG_YAML_NAME}' uses generator expressions." | ||
"Use 'yaml_generate()' to write its contents to a file." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really what we want ?
I could see cases where we would like to write parts of a yaml file (such as build_info) during configure time even though we don't have all knowledge yet.
A partial yaml file may be useful in cases where CMake generator fails.
Secondly, the caller of yaml_save()
may not know if other parts of the build system has decided to use genex'es, for example a genex could suddenly be used by a hal, Zephyr module, etc and thus only be active when said feature is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to write intermediate, "broken" output files for the various reasons in the review comment. However, I think it would be fine to write to a different file (.partial_N.yaml
?) when genexes are detected. This would still provide debugging info, but avoid most of the above breakages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to write intermediate, "broken" output files for the various reasons in the review comment.
I said nothing about writing broken files. I said partial files.
For example like this:
build_info(zephyr version VALUE 1.2.3)
build_info(foo lib files VALUE $<TARGET_PROPERTY:foo,SOURCES>)
yaml_save(NAME build_info) # Out file with known content is written immediately
# genex'es are evaluated (and written) at generator time.
... more code ...
build_info(foo bar VALUE baz)
yaml_save(NAME build_info)
So during CMake configure and generator time the following happens:
- First yaml_save() (configure time)
zephyr: version: 1.2.3
- Second yaml_save() (configure time)
zephyr: version: 1.2.3 foo: bar: baz
- Generator time,
genex from the property evaluationzephyr: version: 1.2.3 foo: bar: baz lib: - 'file1.c' - 'file2.c'
cmake/modules/yaml.cmake
Outdated
add_custom_target(${ARG_YAML_NAME}_yaml_filter ALL DEPENDS ${yaml_file}.tmp ${yaml_target}) | ||
add_custom_command(TARGET ${yaml_target} PRE_BUILD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like something which can break in case two parts of the build system decides to call yaml_generate()
on the same target independent of each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I only expected one "save" operation, as it is a target-generating step. What is a possible use case for multiple yaml_generate
calls on the same scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that you could want to save known data as soon as you have them.
For example, sysbuild could want to save as much info as possible before CMake config step of individual images are executed (and potentially breaks the configure step).
This would provide knowledge to users / IDEs / other tools about configuration files, included images, etc. even if one of those fails.
cmake/yaml-filter.cmake
Outdated
# YAML needs to happen after cmake has completed processing. | ||
# | ||
# This scripts expects as input: | ||
# - TMP_FILE: the name of the input file, in JSON format, that contains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want / need a temp file, then let's at least keep the format clear in this script.
That is JSON_FILE
instead of TMP_FILE
.
Then it's up to the caller of this script to decide if the json file is just temporary or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will do. The rationale for TMP here was that the JSON still contains peculiar stuff like lists-as-strings, and so it is not a "final" JSON format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the early review @tejlmand! 👍
I think we should avoid [having] both
yaml_save()
andyaml_generate()
.
Users should not care when writing the file whether it contains genex'es or not.
I tend to disagree, or more specifically, I think that users must know if they want to support genexes or not. This choice results in very different flows:
- when no genexes are expected, the YAML may be written immediately and its available even during config time; there is no target associated to the file.
- when genexes may be present, an intermediate non-YAML file must be written after config is done, and then post-processed during build. If the results are required for further evaluation, targets must depend on the final file.
Config time (yaml_save ) |
Build time (yaml_generate ) |
Target / Deps | |
---|---|---|---|
No genex | full contents | (same full contents) | meaningless |
With genex | partial, broken | full contents | required |
Unless we drop the immediate file write in yaml_save
, IMO merging them has confusing results:
-
Most importantly, the resulting "common save" would have unspecified semantics - the file would be available immediately or as a build step depending on other code. Having two separate functions makes it explicit what the expected flow is. Note that it is harmless to expect genexes but not actually have them.
-
Secondly, generation requires an associated
TARGET
and this must be provided to the "common save" [1] when genexes are expected. I could write it anyway and skip the generation code if no target was given, but should I warn the user (noisy in your example!) or not (a "silent" bug)? -
Finally, the build info YAML is re-loaded from disk to support sysbuild; in this case, a partially written file may be unparsable or introduce unexpected dynamics at best. What if, for example, I read in a genex from another CMake instance?
Let me know your thoughts on these issues.
Note [1]: To avoid that I also tried adding a scoped property early via yaml_create(TARGET ...)
, but this would require doing the same to yaml_load()
, and a yaml_load(TARGET ...)
is confusing at best.
cmake/modules/yaml.cmake
Outdated
message(FATAL_ERROR "YAML context '${ARG_YAML_NAME}' uses generator expressions." | ||
"Use 'yaml_generate()' to write its contents to a file." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to write intermediate, "broken" output files for the various reasons in the review comment. However, I think it would be fine to write to a different file (.partial_N.yaml
?) when genexes are detected. This would still provide debugging info, but avoid most of the above breakages.
cmake/modules/yaml.cmake
Outdated
add_custom_target(${ARG_YAML_NAME}_yaml_filter ALL DEPENDS ${yaml_file}.tmp ${yaml_target}) | ||
add_custom_command(TARGET ${yaml_target} PRE_BUILD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I only expected one "save" operation, as it is a target-generating step. What is a possible use case for multiple yaml_generate
calls on the same scope?
cmake/yaml-filter.cmake
Outdated
# YAML needs to happen after cmake has completed processing. | ||
# | ||
# This scripts expects as input: | ||
# - TMP_FILE: the name of the input file, in JSON format, that contains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will do. The rationale for TMP here was that the JSON still contains peculiar stuff like lists-as-strings, and so it is not a "final" JSON format.
f2aeed6
to
99f5cea
Compare
How can the Zephyr top-level CMakeLists.txt here: Line 2287 in 0709478
know if some arbitrary HAL decides to call build_info() that includes a genex ?
And the same go for any other yaml save usage. |
f929631
to
8978910
Compare
This introductory commit refactors the `yaml_set` function separating the bodies of list operations into internal functions while preserving their original behavior. The conditions that cause the value to be interpreted as a list are also verified only once, avoiding multiple checks. Signed-off-by: Luca Burelli <[email protected]>
This introductory commit removes the ignored 'actual' variable from the 'yaml_set()' calls in the tests to make the code cleaner. Also, a badly indented block is fixed in the 'message()' macro. No functional change is introduced by this commit. Signed-off-by: Luca Burelli <[email protected]>
This test verifies that the APPEND LIST operation in yaml_set() works as expected. Signed-off-by: Luca Burelli <[email protected]>
To verify proper functionality, the YAML module test suite is now run twice: once in the regular CMake project mode and once in the CMake script mode. Signed-off-by: Luca Burelli <[email protected]>
This commit adds support for generator expressions in values and lists to the yaml module. Generator expressions can only be expanded by CMake after all configuration code has been executed and the final values of the project properties are defined. This means that contexts that contain generator expressions are written twice: - the first time, during yaml_save(), a comment with the raw unexpanded string is saved instead of the key that uses generator expressions in the YAML file; - the second time, as a pre-build step of a target, the generator expressions are expanded, and the final YAML file is written. This two-step process also allows to overcome the issue of lists that are extracted from generator expressions, whose elements would be expanded into a single string if written directly to the YAML file. Instead, the lists are stored in their CMake string format with a special marker, expanded by CMake into a temporary JSON file, and the conversion to a proper list is performed during the pre-build step. Note that when generator expressions are used in the context: - the GENEX keyword must be provided to yaml_set(), to mark the context as containing generator expressions; this is necessary to avoid storing the genexes as raw strings in the YAML. - the YAML file must be written providing the TARGET parameter to yaml_save(). If it is not provided, a warning will be issued and only the partial file will be written. Signed-off-by: Luca Burelli <[email protected]>
Add tests to the YAML suite for generator expressions. The test checks that generator expressions are handled as expected at different stages: - immediately after yaml_save(), the file must contain only non-genex entries; - at a later time, after the target has been built, the file must contain all the expanded genex values. Signed-off-by: Luca Burelli <[email protected]>
@tejlmand Here's v2, review ready:
This code is used in #83705 to rewrite the EDK to read the same data from |
This PR adds support for generator expressions (genexes) in values and lists to the
yaml
module, along with associated tests. As discussed in #78727, this is a prerequisite for cleaning up and expanding the LLEXT EDK information.Since the
yaml
module will also be used fromllext-edk.cmake
in script mode, this PR expands the currentYAML
test suite so that it is executed both in CMake project and script modes.This new functionality can be seen in operation in PR #83705, which rewrites the EDK to use the YAML import.
Implementation details
Generator expression semantics
Values that contain genexes must be stored in the YAML context passing the new
GENEX
flag toyaml_set()
to enable genex expansion on the whole context.Generator expressions can only be expanded by CMake after all configuration code has been executed and the final values of the project properties are defined. This is a challenge for current
yaml_save()
, which is expected to write out the YAML file immediately.To achieve both goals, on contexts that use generator expressions the YAML file is written twice:
yaml_save()
, excluding all values that contain a generator expression;To enable this two-step build, the
TARGET
parameter must be passed to theyaml_save()
call; otherwise, only the partial file is written and a warning is issued if the file contains genexes.Lists expanded from a single generator expression
By the time the generator expressions are expanded by CMake, the structure of the YAML file would be already defined, and all items inside a list would get expanded into a single element - definitely not the desired behavior.
To overcome this limitation, in YAML contexts that use generator expressions, lists are not directly expanded into proper JSON lists but stored in their CMake string format with a special marker.
When the
yaml_save()
function is called, the internal JSON representation is written usingfile(GENERATE ...)
to a temporary file to obtain a seamlessly expanded JSON object. The temporary file is then processed (as a pre-build step of the specified target) by a custom command that reads the intermediate JSON, replaces the marked lists-as-string with the correct list format, and writes the final YAML file.Review notes
The first 2 commits are preparatory refactors that do not change the functionality of the existing code:
2a60f3b
cmake: yaml: refactor yaml_set() list operations7012371
tests: yaml: refactor to remove unused variablesCommits 3 and 4 expand the
yaml
module tests:fe2901f
tests: yaml: add test for APPEND LIST operation0e693f9
tests: yaml: test the API in CMake script mode as wellCommits 5 and 6 introduce generator expression logic and tests, and may be best viewed ignoring whitespace due to indentation changes for large blocks of existing code:
b45b169
cmake: yaml: add support for generator expressions0600b1c
tests: yaml: add test for generator expressions