-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized codesize benchmarks do not clearly show the power of individual optimizations #5945
Comments
Many of these are already documented on sites such as https://github.com/johnthagen/min-sized-rust?tab=readme-ov-file and I think we should reference those documents when possible. We can mention those, but it is probably most useful for us to focus on optimizations unique to libraries and especially cross-language libraries. The Rust-specific optimizations in the above list include:
If the tradeoff is "may increase compile times", I did not include it above, because almost all of these will impact compile times. If "Special Requirements" says Rust Nightly, I also did not include it as a Tradeoff since it is already listed in the table. There are two new things in that list since the last time I looked at it which I haven't evaluated:
I think Now for the options that are unique to ICU4X and generally cross-language Rust libraries:
The first 3 rows are basically an enumeration. You shouldn't need Now, the first time I had to comprehend all these flags, I was a bit like, ":angry: why doesn't Does this make sense? Did I miss anything? |
Yes, this makes sense, but I don't think it fully answers the question I had here which was "which axes should our ideal benchmark be testing?" Do you think we should be testing each of these? And yeah, the "use everything and disable the things you don't need" path was why I suggested having an "release build with all but one of these optimizations applied" test for each. Though I think some intermediate numbers are also valuable since people will quite often be in the position of wanting to get the easy wins without dealing with toolchain complexities (nightlies, etc). |
For now I'm going to stick with the benchmarks in my PR (with the tweaks I was planning on doing), and we can leave this issue open to properly fix them as needed. The PR is a good start for the principled approach proposed here but is trying not to do everything just yet since we don't quite need that for the current set of discussions. |
Progress on #5945 This does not fully enact the vision in #5945, but it does start testing various combinations of tactics. I didn't want to test *all* combinations and I didn't want to spend time figuring out which combinations we really want, so I went ahead and tested the ones most relevant to the current investigation (#5935). This shows the effects of LTO, linker-plugin-lto, and stripping+gc-sections on panic-abort (with panic-immediate-abort std) release Rust builds. Benchmarks with just panic=abort but no panic-immediate-abort/panic-abort std would probably be useful to clients but I haven't added them now. It may be worth using makefile magic to truly build a matrix of targets. <!-- Thank you for your pull request to ICU4X! Reminder: try to use [Conventional Comments](https://conventionalcomments.org/) to make comments clearer. Please see https://github.com/unicode-org/icu4x/blob/main/CONTRIBUTING.md for general information on contributing to ICU4X. -->
Came up in #5935
ICU4X has test-c-tiny and test-js-tiny to show how far codesize can be optimized.
These are incremental, applying optimization on top of optimization to slowly reduce codesize. This shows a nice progression, but it is not helpful when understanding what the effect of each optimization is in isolation.
I think this is an important function of such a benchmark: many of these techniques are not uniformly available and impose additional constraints upon the build: some require nightly, some required paired Rust/Clang versions, some force build-std, some require a particular C compiler, some reduce debuggability, and so on.
Furthermore, a lot of these benchmarks build on top of each other: using a release build will of course help LTO be more effective (percent-wise).
Providing numbers for every combination is going to be a lot of work and likely an overwhelming amount of data. However, I think what we could do is identify a list of optimizations that are potentially relevant but not necessarily always possible, and then provide numbers for:
-Clinker-plugin-lto
needs LTO, apply its dependencies tooThis would both give us an idea of the immediate wins of individual optimizations, and how they cumulatively work together.
The list of optimizations I can identify are:
-Clinker-plugin-lto
--gc-sections
--strip-all
This list might be larger than necessary, so we could merge some entries if desired. I might also be missing something. I didn't include Rust debug vs release here because I don't think debug build codesize numbers really mean much, and I can't think of a usecase for caring about those numbers.
Thoughts? @sffc
The text was updated successfully, but these errors were encountered: