Embed public and private shares into DkgPrivateBegin and DkgEndBegin #98

xoloki · 2024-11-22T21:16:57Z

This change addresses issues arising from out-of-order messages that may arise when WSTS is used in a p2p network. It adds a config option to the Coordinator and Signer state machines, that when enabled embeds the DkgPublicShares and DkgPrivateShares messages into DkgPrivateBegin and DkgPrivateEnd.

djordon

Not done yet. I had started this a review a while ago, figured that I'd post what I had written so far rather than finish the review.

src/state_machine/coordinator/fire.rs

djordon · 2024-12-06T16:23:24Z

src/state_machine/signer/mod.rs

@@ -891,7 +921,9 @@ impl<SignerType: SignerTrait> StateMachine<State, Error> for Signer<SignerType>
 }
 #[cfg(test)]
 /// Test module for signer functionality
-pub mod test {
+mod test {
+    use hashbrown::HashMap;


Can we use the standard library version of HashMap, generally speaking?

Originally, there was a goal to provide no_std support, since this is a relatively common use case (see #60) in Rust code generally, and crypto code specifically. Hence the usage of hashbrown for HashMap and HashSet, rand_core vs rand, core for fmt, etc.

Now that the fire::Coordinator needs to do timeouts, though, this is no longer possible for the crate as a whole. There is no replacement for std::time::Instant in core or alloc. We could however still make the non-state machine code no_std.

Oh right.

And we also use Vec as well. I would think that the simplest path forward to support no_std is to go through the alloc crate, which has "replacements" for Vec but not for HashMap (I say replacements but I think std re-exports alloc types). alloc does have a replacement for BTreeMap, which might be a good move too. We've moved to using the BTreeMap during serialization in sbtc signers so that everything is deterministic, might be worthwhile here as well.

But yeah, if the no_std version of things is something we want to do, we shouldn't use the std version of HashMap here, but maybe switching to the BTreeMap version is the way to go. Thoughts?

I'm definitely on board with switching to BTreeMap where it makes sense. It's more space efficient anyway. Where are you currently running into this? The state machines are like half HashMap half BTreeMap, but the rest of the code is pure HashMap.

Thinking more about no_std, I looked over p256k1 and realized that it's chock full of std, over 1k LOC. So until we can move to k256, I think we should stop worrying about no_std and just do what's right.

I'm definitely on board with switching to BTreeMap where it makes sense. It's more space efficient anyway. Where are you currently running into this?

It's not an issue that we're running into with WSTS. But we serialize our messages as protobufs before broadcasting them on the network. The protobuf spec doesn't have a canonical serialization, so we've picked one serialization implementation so that things are deterministic and so on, and converting maps to BTreeMaps before serialization is part of that process. Well, we haven't proved that our specific implementation is deterministic yet (stacks-network/sbtc#1001), but that's the goal.

Thinking more about no_std, I looked over p256k1 and realized that it's chock full of std, over 1k LOC. So until we can move to k256, I think we should stop worrying about no_std and just do what's right.

Yeah this makes sense to me.

I'm not sure where I'd land on the BTreeMap vs std HashMaps here. My guess is that users would usually have a small number of signers (maybe hundreds or thousands) but perhaps a large number of keys (thousands or maybe tens of thousands). With just that I'd lean BTreeMap but I suppose either works. Not sure if it's worth the effort to benchmark though.

No need for any of these changes in this PR, I was just asking. I think I had asked before and forgot what the resolution was.

Is moving to k256 a goal?

Yeah the BTreeMap discussion is def orthogonal to this PR.

k256 is pure Rust so there's definitely some advantages to moving towards it; p256k1 has bindgen and FFI that make upgrading the underlying secp256k1 library difficult. But the performance benefits of p256k1 are significant, especially with the multiexponentiation support.

Oh, is the one added in RustCrypto/elliptic-curves#955, just super slow compared to libsecp256k1?

It's 3x slower. Which is not a deal breaker for sBTC, and maybe not even for Nakamoto (when they get their coordinator issues sorted out). It was definitely a dealbreaker when DKG was taking 90 minutes, but those days are long gone.

AFAICT k256 is just using precomputed lookup tables for lincomb, whereas secp256k1 is doing a full Pippenger algo which gives a logarithmic speedup.

https://github.com/RustCrypto/elliptic-curves/blob/master/k256/src/arithmetic/mul.rs#L294

https://github.com/Trust-Machines/p256k1/blob/master/p256k1/_secp256k1/src/ecmult_impl.h#L498

Which makes sense, because p256k1 ecmult (which uses precomputed tables) is basically the same performance as k256 lincomb.

djordon

Honestly, I'm not entirely sure that this is the way forward to solve the underlying issue. And since we don't have a pressing need for an immediate solution, I'd like us to think through what the "right" solution really is. It might be this but I'm not so sure since I haven't taken any time to think it through.

BTW, it's a good idea for the sbtc signers to cap the message size, and I now think that they do so by default when before I didn't.

src/net.rs

jferrant · 2024-12-20T14:19:48Z

src/state_machine/signer/mod.rs

+    TryFromInt,
+}
+
+impl From<AesGcmError> for Error {


I think you can replace this with AesGcm(#[from] AesGcmError) above no?

Yes, we should.

No, we can't:

83 | AesGcm(#[from] AesGcmError), | ^^^^ method cannot be called on `&Error` due to unsatisfied trait bounds | ::: /Users/axoloki/.cargo/registry/src/index.crates.io-6f17d22bba15001f/aead-0.5.2/src/lib.rs:79:1 | 79 | pub struct Error; | ---------------- doesn't satisfy `aes_gcm::Error: AsDynError<'_>` or `aes_gcm::Error: StdError` | = note: the following trait bounds were not satisfied: `aes_gcm::Error: StdError` which is required by `aes_gcm::Error: AsDynError<'_>` `&aes_gcm::Error: StdError` which is required by `&aes_gcm::Error: AsDynError<'_>`

thiserror is awesome, but sadly we can't do much about upstream that errors that don't satisfy the trait bounds.

Fortunately, the AES DOS fix required me to define an EncryptionError type, which takes care of wrapping aes_gcm::Error. So we only have to do the manual chaining once.

jferrant · 2024-12-20T14:21:04Z

src/state_machine/signer/mod.rs

+                if !self.embed_public_private_shares {
+                    self.dkg_public_share(dkg_public_shares)
+                } else {
+                    Ok(vec![])


Would this warrant a warning if we are getting a dkgpubilcshares message? EDIT: OH we still expect them to send the DkgPublicShares and DKgPrivateShares but we just wait forDkgEndBegin to actually do anything with them? Just making sure I understand this right.

Yeah, the signers will still see the public and private shares coming from other signers, they just ignore them if configured to wait for the coordinator messages that embed them.

jferrant

Just a question for my own understanding, but looks good.

xoloki · 2024-12-21T16:35:47Z

I'd like us to think through what the "right" solution really is. It might be this but I'm not so sure since I haven't taken any time to think it through.

Not sure when this comment was made relative to the last time we discussed it in standup. Are you still on the fence about the proper way to do this?

Given sBTC networking characteristics, I don't see any other real option. The current band aid of sprinkling sleeps throughout the code is extremely suboptimal, and will invariably cause problems.

BTW, it's a good idea for the sbtc signers to cap the message size, and I now think that they do so by default when before I didn't.

Even with 100 signers/key shares, the size needed will be

    100 * 100 * 32 ~ 320k

That will comfortably fit under a 1MB cap, much less the 2MB which was last mentioned.

djordon · 2024-12-21T16:38:27Z

src/state_machine/coordinator/frost.rs

@@ -229,10 +229,18 @@ impl<Aggregator: AggregatorTrait> Coordinator<Aggregator> {
            dkg_id = %self.current_dkg_id,
            "Starting Private Share Distribution"
        );
+        let dkg_public_shares = if self.config.embed_public_private_shares {
+            (0..self.config.num_signers)
+                .map(|id| (id, self.dkg_public_shares[&id].clone()))


Let's refrain from direct array access.

The frost::Coordinator requires full participation at every stage of DKG and Sign operations, so the direct array access here is safe.

But let's be pedantic anyway :)

Yeah, I think the issue is that we need to 100% confident that there is no way for this code to panic. This might be true now, but when we modify some part of this code that may no longer be the case. And if that happens I'd rather return an error than panic.

djordon · 2024-12-21T16:38:36Z

src/state_machine/coordinator/frost.rs

@@ -251,10 +259,18 @@ impl<Aggregator: AggregatorTrait> Coordinator<Aggregator> {
            dkg_id = %self.current_dkg_id,
            "Starting DKG End Distribution"
        );
+        let dkg_private_shares = if self.config.embed_public_private_shares {
+            (0..self.config.num_signers)
+                .map(|id| (id, self.dkg_private_shares[&id].clone()))


djordon · 2024-12-21T16:39:07Z

src/state_machine/signer/mod.rs

+    TryFromInt,
+}
+
+impl From<AesGcmError> for Error {


Yes, we should.

djordon · 2024-12-21T17:06:23Z

Not sure when this comment was made relative to the last time we discussed it in standup. Are you still on the fence about the proper way to do this?

It was a day or so before we spoke about it at stand-up yesterday. I haven't given it enough thought though. I think the proposed approach works and fixes our issue, but I just need some more time to give it some thought.

Given sBTC networking characteristics, I don't see any other real option. The current band aid of sprinkling sleeps throughout the code is extremely suboptimal, and will invariably cause problems.

There are other options, but yeah many of them are worse than the solution proposed here. I'll give it a some thinking, I expect to arrive at the same place that you're at.

…messages make embedding of public and private shares configurable reenable start_private_shares test now that embedding is configurable inc major semver for backwards incompatible API changes test modules do not need to be public unless someone is using them externally allow a static mut ref to an atomic variable used for test log initialization; allow many arguments to Signer::new allow many arguments to setup_with_timeouts fmt and code change for rng PR rebase rebase fixes add TryFromInt to signer and coordinator errors so we can remove more unwrap calls remove unwrap in signer state machine remove unnecessary AesGcm error since we now have EncryptionError; remove bracket access from frost coordinator add MissingPrivateShares error; remove bracket access in frost Coordinator

djordon mentioned this pull request Nov 23, 2024

[Bug]: Race condition during DKG stacks-network/sbtc#929

Open

3 tasks

xoloki force-pushed the embed-public-private-shares branch from c90232f to 38cf8ee Compare December 2, 2024 16:06

xoloki changed the title ~~embed public and private shares into DkgPrivateBegin and DkgEndBegin~~ Embed public and private shares into DkgPrivateBegin and DkgEndBegin Dec 2, 2024

xoloki requested a review from djordon December 2, 2024 18:24

xoloki marked this pull request as ready for review December 2, 2024 18:56

xoloki force-pushed the embed-public-private-shares branch 2 times, most recently from e78101f to a4c33be Compare December 6, 2024 16:12

djordon reviewed Dec 6, 2024

View reviewed changes

xoloki force-pushed the embed-public-private-shares branch from 3a6289d to 95fe283 Compare December 19, 2024 13:48

xoloki self-assigned this Dec 19, 2024

djordon reviewed Dec 20, 2024

View reviewed changes

src/net.rs Show resolved Hide resolved

jferrant reviewed Dec 20, 2024

View reviewed changes

jferrant approved these changes Dec 20, 2024

View reviewed changes

xoloki force-pushed the embed-public-private-shares branch from 95fe283 to 252b2a0 Compare December 21, 2024 16:42

djordon reviewed Dec 21, 2024

View reviewed changes

xoloki force-pushed the embed-public-private-shares branch from 3335f25 to 89a0811 Compare December 21, 2024 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embed public and private shares into DkgPrivateBegin and DkgEndBegin #98

Embed public and private shares into DkgPrivateBegin and DkgEndBegin #98

xoloki commented Nov 22, 2024 •

edited

Loading

djordon left a comment

djordon Dec 6, 2024

xoloki Dec 6, 2024

djordon Dec 6, 2024

xoloki Dec 8, 2024 •

edited

Loading

djordon Dec 8, 2024

xoloki Dec 8, 2024

djordon Dec 9, 2024

xoloki Dec 9, 2024

xoloki Dec 9, 2024 •

edited

Loading

djordon left a comment

jferrant Dec 20, 2024

djordon Dec 21, 2024

xoloki Dec 21, 2024

xoloki Dec 21, 2024

jferrant Dec 20, 2024 •

edited

Loading

xoloki Dec 21, 2024

jferrant left a comment •

edited

Loading

xoloki commented Dec 21, 2024

djordon Dec 21, 2024

xoloki Dec 21, 2024

djordon Dec 21, 2024

xoloki Dec 21, 2024

djordon Dec 21, 2024

xoloki Dec 21, 2024

djordon Dec 21, 2024

djordon commented Dec 21, 2024

Embed public and private shares into DkgPrivateBegin and DkgEndBegin #98

Are you sure you want to change the base?

Embed public and private shares into DkgPrivateBegin and DkgEndBegin #98

Conversation

xoloki commented Nov 22, 2024 • edited Loading

djordon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xoloki Dec 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xoloki Dec 9, 2024 • edited Loading

Choose a reason for hiding this comment

djordon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jferrant Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jferrant left a comment • edited Loading

Choose a reason for hiding this comment

xoloki commented Dec 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djordon commented Dec 21, 2024

xoloki commented Nov 22, 2024 •

edited

Loading

xoloki Dec 8, 2024 •

edited

Loading

xoloki Dec 9, 2024 •

edited

Loading

jferrant Dec 20, 2024 •

edited

Loading

jferrant left a comment •

edited

Loading