-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embed public and private shares into DkgPrivateBegin and DkgEndBegin #98
base: main
Are you sure you want to change the base?
Conversation
c90232f
to
38cf8ee
Compare
e78101f
to
a4c33be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not done yet. I had started this a review a while ago, figured that I'd post what I had written so far rather than finish the review.
@@ -891,7 +921,9 @@ impl<SignerType: SignerTrait> StateMachine<State, Error> for Signer<SignerType> | |||
} | |||
#[cfg(test)] | |||
/// Test module for signer functionality | |||
pub mod test { | |||
mod test { | |||
use hashbrown::HashMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the standard library version of HashMap
, generally speaking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, there was a goal to provide no_std
support, since this is a relatively common use case (see #60) in Rust code generally, and crypto code specifically. Hence the usage of hashbrown
for HashMap
and HashSet
, rand_core
vs rand
, core
for fmt
, etc.
Now that the fire::Coordinator
needs to do timeouts, though, this is no longer possible for the crate as a whole. There is no replacement for std::time::Instant
in core
or alloc
. We could however still make the non-state machine code no_std
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right.
And we also use Vec
as well. I would think that the simplest path forward to support no_std
is to go through the alloc
crate, which has "replacements" for Vec
but not for HashMap
(I say replacements but I think std
re-exports alloc
types). alloc
does have a replacement for BTreeMap
, which might be a good move too. We've moved to using the BTreeMap
during serialization in sbtc signers so that everything is deterministic, might be worthwhile here as well.
But yeah, if the no_std
version of things is something we want to do, we shouldn't use the std version of HashMap
here, but maybe switching to the BTreeMap
version is the way to go. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely on board with switching to BTreeMap where it makes sense. It's more space efficient anyway. Where are you currently running into this? The state machines are like half HashMap
half BTreeMap
, but the rest of the code is pure HashMap
.
Thinking more about no_std
, I looked over p256k1
and realized that it's chock full of std
, over 1k LOC. So until we can move to k256
, I think we should stop worrying about no_std
and just do what's right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely on board with switching to BTreeMap where it makes sense. It's more space efficient anyway. Where are you currently running into this?
It's not an issue that we're running into with WSTS. But we serialize our messages as protobufs before broadcasting them on the network. The protobuf spec doesn't have a canonical serialization, so we've picked one serialization implementation so that things are deterministic and so on, and converting maps to BTreeMap
s before serialization is part of that process. Well, we haven't proved that our specific implementation is deterministic yet (stacks-network/sbtc#1001), but that's the goal.
Thinking more about
no_std
, I looked overp256k1
and realized that it's chock full ofstd
, over 1k LOC. So until we can move tok256
, I think we should stop worrying aboutno_std
and just do what's right.
Yeah this makes sense to me.
I'm not sure where I'd land on the BTreeMap
vs std HashMap
s here. My guess is that users would usually have a small number of signers (maybe hundreds or thousands) but perhaps a large number of keys (thousands or maybe tens of thousands). With just that I'd lean BTreeMap
but I suppose either works. Not sure if it's worth the effort to benchmark though.
No need for any of these changes in this PR, I was just asking. I think I had asked before and forgot what the resolution was.
Is moving to k256
a goal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the BTreeMap discussion is def orthogonal to this PR.
k256
is pure Rust so there's definitely some advantages to moving towards it; p256k1
has bindgen and FFI that make upgrading the underlying secp256k1
library difficult. But the performance benefits of p256k1
are significant, especially with the multiexponentiation support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, is the one added in RustCrypto/elliptic-curves#955, just super slow compared to libsecp256k1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 3x slower. Which is not a deal breaker for sBTC
, and maybe not even for Nakamoto (when they get their coordinator issues sorted out). It was definitely a dealbreaker when DKG was taking 90 minutes, but those days are long gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT k256
is just using precomputed lookup tables for lincomb
, whereas secp256k1
is doing a full Pippenger algo which gives a logarithmic speedup.
https://github.com/RustCrypto/elliptic-curves/blob/master/k256/src/arithmetic/mul.rs#L294
https://github.com/Trust-Machines/p256k1/blob/master/p256k1/_secp256k1/src/ecmult_impl.h#L498
Which makes sense, because p256k1
ecmult
(which uses precomputed tables) is basically the same performance as k256
lincomb
.
3a6289d
to
95fe283
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'm not entirely sure that this is the way forward to solve the underlying issue. And since we don't have a pressing need for an immediate solution, I'd like us to think through what the "right" solution really is. It might be this but I'm not so sure since I haven't taken any time to think it through.
BTW, it's a good idea for the sbtc signers to cap the message size, and I now think that they do so by default when before I didn't.
src/state_machine/signer/mod.rs
Outdated
TryFromInt, | ||
} | ||
|
||
impl From<AesGcmError> for Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can replace this with AesGcm(#[from] AesGcmError) above no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we can't:
83 | AesGcm(#[from] AesGcmError),
| ^^^^ method cannot be called on `&Error` due to unsatisfied trait bounds
|
::: /Users/axoloki/.cargo/registry/src/index.crates.io-6f17d22bba15001f/aead-0.5.2/src/lib.rs:79:1
|
79 | pub struct Error;
| ---------------- doesn't satisfy `aes_gcm::Error: AsDynError<'_>` or `aes_gcm::Error: StdError`
|
= note: the following trait bounds were not satisfied:
`aes_gcm::Error: StdError`
which is required by `aes_gcm::Error: AsDynError<'_>`
`&aes_gcm::Error: StdError`
which is required by `&aes_gcm::Error: AsDynError<'_>`
thiserror
is awesome, but sadly we can't do much about upstream that errors that don't satisfy the trait bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortunately, the AES DOS fix required me to define an EncryptionError
type, which takes care of wrapping aes_gcm::Error
. So we only have to do the manual chaining once.
if !self.embed_public_private_shares { | ||
self.dkg_public_share(dkg_public_shares) | ||
} else { | ||
Ok(vec![]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this warrant a warning if we are getting a dkgpubilcshares message? EDIT: OH we still expect them to send the DkgPublicShares and DKgPrivateShares but we just wait forDkgEndBegin to actually do anything with them? Just making sure I understand this right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the signers will still see the public and private shares coming from other signers, they just ignore them if configured to wait for the coordinator messages that embed them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question for my own understanding, but looks good.
Not sure when this comment was made relative to the last time we discussed it in standup. Are you still on the fence about the proper way to do this? Given
Even with 100 signers/key shares, the size needed will be
That will comfortably fit under a 1MB cap, much less the 2MB which was last mentioned. |
95fe283
to
252b2a0
Compare
@@ -229,10 +229,18 @@ impl<Aggregator: AggregatorTrait> Coordinator<Aggregator> { | |||
dkg_id = %self.current_dkg_id, | |||
"Starting Private Share Distribution" | |||
); | |||
let dkg_public_shares = if self.config.embed_public_private_shares { | |||
(0..self.config.num_signers) | |||
.map(|id| (id, self.dkg_public_shares[&id].clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's refrain from direct array access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The frost::Coordinator
requires full participation at every stage of DKG
and Sign
operations, so the direct array access here is safe.
But let's be pedantic anyway :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the issue is that we need to 100% confident that there is no way for this code to panic. This might be true now, but when we modify some part of this code that may no longer be the case. And if that happens I'd rather return an error than panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -251,10 +259,18 @@ impl<Aggregator: AggregatorTrait> Coordinator<Aggregator> { | |||
dkg_id = %self.current_dkg_id, | |||
"Starting DKG End Distribution" | |||
); | |||
let dkg_private_shares = if self.config.embed_public_private_shares { | |||
(0..self.config.num_signers) | |||
.map(|id| (id, self.dkg_private_shares[&id].clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/state_machine/signer/mod.rs
Outdated
TryFromInt, | ||
} | ||
|
||
impl From<AesGcmError> for Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should.
It was a day or so before we spoke about it at stand-up yesterday. I haven't given it enough thought though. I think the proposed approach works and fixes our issue, but I just need some more time to give it some thought.
There are other options, but yeah many of them are worse than the solution proposed here. I'll give it a some thinking, I expect to arrive at the same place that you're at. |
…messages make embedding of public and private shares configurable reenable start_private_shares test now that embedding is configurable inc major semver for backwards incompatible API changes test modules do not need to be public unless someone is using them externally allow a static mut ref to an atomic variable used for test log initialization; allow many arguments to Signer::new allow many arguments to setup_with_timeouts fmt and code change for rng PR rebase rebase fixes add TryFromInt to signer and coordinator errors so we can remove more unwrap calls remove unwrap in signer state machine remove unnecessary AesGcm error since we now have EncryptionError; remove bracket access from frost coordinator add MissingPrivateShares error; remove bracket access in frost Coordinator
3335f25
to
89a0811
Compare
This change addresses issues arising from out-of-order messages that may arise when
WSTS
is used in ap2p
network. It adds a config option to theCoordinator
andSigner
state machines, that when enabled embeds theDkgPublicShares
andDkgPrivateShares
messages intoDkgPrivateBegin
andDkgPrivateEnd
.