Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Investigation] Reported performance regression when updating wasmtime from 19 to 24. #2058

Closed
Stebalien opened this issue Oct 15, 2024 · 20 comments · Fixed by #2059
Closed

Comments

@Stebalien
Copy link
Member

We've seen a reported performance regression that appears to be associated with the wasmtime update that happened in FVM 4.4 (wasmtime 19 -> 24). This was an interesting performance regression because:

  1. It shows up even when updating FVM 2.7 to 2.9 even when wasmtime 2.9 isn't being used, just included. This indicates that the issue is in some common wasmtime dependency.
  2. None of the gas benchmarks in FVM v4 are picking up the regression, but those benchmarks mostly focus on non-wasm things. Importantly, I'm reasonably sure that:
    1. It's not related to hashing.
    2. It's not related to signature verification.
    3. It's not related to calling into wasm contracts (see Re-Price Method Invocation #2057).
    4. It's not related to trapping and running out of gas (we have an explicit benchmark for this because it ocasionally rears its head and turns into an $N^2$ problem).
    5. It shouldn't be cranelift. That tends to be version-locked with a specific wasmtime version.

Things I haven't checked:

  1. The cost of wasm instructions in general. We priced wasm instructions long ago and it was a bit of a manual process.
  2. The cost of jumping in and out of wasm. Wasmtime has made some recent changes here however, I'd expect any issues there to be specific to the wasmtime version in-use (observation (1) above appears to conflict with this).

All I can think of now is... rustix? Or something like that?

@Stebalien
Copy link
Member Author

My next step is to try to update FVM v2 in forest to see what deps get updated along with it.

@Stebalien
Copy link
Member Author

We've confirmed that having wasmtime v24 anywhere in the build tree, even if unused, causes this slowdown.

@Stebalien
Copy link
Member Author

  • It's not the write_core feature in object as far as I can tell.
  • I'm also seeing some changes in gimli, but v24 uses a different version of that so it shouldn't matter.

Interesting, I'm seeing smallvec now depending on serde. I can't imagine how that might be relevant, but I'll need to check.

@Stebalien
Copy link
Member Author

I've tracked it down to wasmtime's cranelift feature (does not appear to be feature unification but I may be missing something). I'm now trying with just wasmtime-cranelift.

@Stebalien
Copy link
Member Author

  • wasmtime-environ doesn't reproduce.
  • cranelift-wasm does.

@Stebalien
Copy link
Member Author

cranelift-codegen reproduces, trying cranelift-control now

@Stebalien
Copy link
Member Author

cranelift-control does not. Trying cranelift-codegen-meta, cranelift-codegen-shared, and cranelift-isle

@Stebalien
Copy link
Member Author

Ok, it is cranelift-codegen specifically. Now I'm trying to see if it's because we're not supporting some architecture by building with CARGO_FEATURE_ALL_ARCH.... we'll see if that even works.

@Stebalien
Copy link
Member Author

Er, trying with reduced features first. Only "std" and "unwind" because those are required.

@Stebalien
Copy link
Member Author

Ok, still broken with those features. Now I'm trying with "all arch" to see if it's some kind if isle issue (I think it is, but I'm not sure if that's the fix).

@Stebalien
Copy link
Member Author

Enabling all architectures doesn't help, and I can't skip the ISLE build (that option only exists if it's pre-built). Now I'm bisecting cranelift-codegen, starting with 0.109.

@Stebalien
Copy link
Member Author

Ok, it's cranelift-codegen 0.107.0 exactly. Now I'm testing a build with all the FVM crates fully updated to get us all on a single wasmtime version just in case it's an issue with multiple wasmtime versions.

@Stebalien
Copy link
Member Author

Ok, it is feature unification. Specifically adding the trace-log feature to regalloc2.

@Stebalien
Copy link
Member Author

My hypothesis is that this is actually just a compilation slowdown, not an execution slowdown. We're only running through ~24 epochs here and will have to pause to lazily (IIRC?) compile actors every time we load a new one. This is my guess because I'm noticing that epochs 1 & 4 are taking a while.

Additionally, if we're speeding through multiple network upgrades (especially if we're switching FVM versions), we'll have to re-compile the actors per network version.

@LesnyRumcajs
Copy link
Contributor

@Stebalien This could be verified by running a longer benchmark without network upgrades. The .env. file would need to be modified along the lines of:

LOTUS_IMAGE=ghcr.io/chainsafe/lotus-devnet:2024-10-10-600728e
FOREST_DATA_DIR=/forest_data
LOTUS_DATA_DIR=/lotus_data
FIL_PROOFS_PARAMETER_CACHE=/var/tmp/filecoin-proof-parameters
MINER_ACTOR_ADDRESS=f01000
LOTUS_RPC_PORT=1234
LOTUS_P2P_PORT=1235
MINER_RPC_PORT=2345
FOREST_RPC_PORT=3456
FOREST_OFFLINE_RPC_PORT=3457
F3_RPC_PORT=23456
F3_FINALITY=100000
GENESIS_NETWORK_VERSION=24
SHARK_HEIGHT=-10
HYGGE_HEIGHT=-9
LIGHTNING_HEIGHT=-8
THUNDER_HEIGHT=-7
WATERMELON_HEIGHT=-6
DRAGON_HEIGHT=-5
WAFFLE_HEIGHT=-4
TUKTUK_HEIGHT=-3
TARGET_HEIGHT=200

Note that the timeout would also need to be extended. Then, we compare the timings before and after the FVM upgrade.

In that way, we will be able to confirm if the slowdown is coming from re-compiling actors (which is not a big issue).

@LesnyRumcajs
Copy link
Contributor

@Stebalien I believe your hypothesis holds. I tried on the config above locally (on a machine that previously reported 50% slowdown).

FVM 4.3.1
________________________________________________________
Executed in  781.81 secs    fish           external
   usr time    5.92 secs    0.00 micros    5.92 secs
   sys time    5.95 secs  696.00 micros    5.95 secs

FVM 4.4
________________________________________________________
Executed in  782.58 secs    fish           external
   usr time    5.81 secs    0.00 micros    5.81 secs
   sys time    6.01 secs  811.00 micros    6.01 secs

So I guess it's "okay-ish" in a sense that the slowdown only occurs under some synthetic conditions. That said, it might make sense to report it to the cranelift maintainers.

@Stebalien
Copy link
Member Author

Yep, I plan on reporting it upstream. I'm also looking into possibly disabling the offending feature (requires upstream help, but I'm guessing it's only needed for GC which we don't use).

Also, for forest, you can modify the "quick" compilation profile to optimize cranelift-codegen (or maybe just regalloc2). That didn't get it to be quite fast enough to pass the test in the 5m timeout, but it definitely improved the situation.

I'm also going to write a quick "load/compile the actors" benchmark.

@Stebalien
Copy link
Member Author

Oh, lol, no. This trace-log is literally just for debugging. I don't think this was supposed to get shipped to production.

@Stebalien
Copy link
Member Author

Stebalien commented Oct 18, 2024

I don't think this was supposed to get shipped to production.

Nvm:

# Note that this just enables `trace-log` for `clif-util` and doesn't turn it on
# for all of Cranelift, which would be bad.

It looks like they thought this wouldn't have a performance impact.

@Stebalien
Copy link
Member Author

Ah, that comment is saying that trace-log is only enabled for clif-util.

And... already fixed but not in the version we're using: bytecodealliance/wasmtime#9128. Time to update to v25, I guess.

Stebalien added a commit that referenced this issue Oct 18, 2024
Importantly, this reverts a previous wasmtime change that enabled
trace-logging in regalloc, massively slowing down compilation across all
FVM versions.

fixes #2058
This was referenced Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants