-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit message caches and future epochs? #43
Comments
See also: amiller/HoneyBadgerBFT#57 ("Bounded Badger") and amiller/HoneyBadgerBFT#12 ("Asynchronous Ratchet") |
Well, you asked for some ideas: Isn't this a typical problem of the attacker being able to force the peer to allocate a resource with very little cost themselves? This is not without precedent, reminding me Slowloris or SYN floods. Neither of these are a perfect match for our problem, but both are solved by forcing more work on the attacker, in the examples above by constraining the time window for sending data more strictly, or moving the memory burden back onto the attacker. What about limiting the underlying transmission channel, e.g. adding a "queue full, please retry" response? In that case, the attacker would have to buffer the message themselves and is expected to retry transmitting it. A sender could simply use a priority queue that sends out messages with the lowest epochs first, so this would only be an issue if two honest nodes have vastly diverged in epochs. These are parameters that can be tweaked though and we are not going to have infinite resources either way. Maybe even normal TCP backpressure is enough as well; if messages from honest nodes are rarely out of order, we could just stop reading from the TCP connection for a while. This in turn will cause sending to stop or slow down on the other peer, naturally causing them to buffer. Finally, we should probably define: How should the network behave if a single node is vastly inferior in computing or network capacity than the others? Imagine eight high-performance servers and a ninth node running on a slow machine behind tor. The network can either throttle itself, dropping down the performance enough to include this node, or mark it as faulty and run at a much higher speed. |
I agree, I think there's no way around moving some memory burden to the sender. And to keep it simple, it may be best to move all of it: We already have the Then we can add (parts of) the Bounded Badger proposal. Essentially it probably comes down to threshold-signing each batch: once we have a signature—i.e. a proof—of batch e, we can drop all buffered outgoing messages of that epoch, because they can all be replaced by that signature and the batch itself. In addition, we could report a lagging node (with a configurable number of epochs) as faulty, so the application logic can decide whether to vote it out. I'm less sure about the agreement epochs, and whether it's even worth implementing a similar kind of checkpointing for them. Or whether to just limit it to a maximum of 120 epochs, like in Bounded Badger. |
Whether the buffer is on the sender or on the recepient doesn't change much and in fact only changes the point of view. If the buffer moves to the sender then the following attack is possible which is a dual to the attack by sending lots of messages: A recepient attacker can delay messages from a sender indefinitely, leading to the sender having to buffer the messages to the attacker indefinitely. The difference of placing the buffer on the sender instead of the recepient - as long as this attack is concerned - is that the sender would have to have a bound on resending messages to avoid a memory leak. So if there is a fix involving buffering on the sender then there should be a dual fix that uses buffering on the recepient. It's interesting to contemplate @afck's idea to sign batches rather than blocks. The Bounded Badger proposal referred to blocks which sounds to require a blockchain. However it is possible that "block" was used to mean "batch" there, in which case there is no difference. In principal, pairs of a batch and a signature might not even be ordered, so that a lagging node might learn a batch of epoch Do you think the batches should be learned in the order of their creation? Or would it be enough to learn without a specific order, possibly requiring the lagging node to verify a chain of signed batches that it has missed as soon as the chain is complete? |
I think it's not quite symmetric: The sender knows that its own messages are not faulty. The recipient can only verify that once it has progressed to the corresponding epoch. I think the batches need to be sent in the correct order. With the above proposal, the recipient would reject a batch for any later epoch, so the sender would have to keep it in the queue anyway. I wonder whether we should even avoid threshold signatures, and just use f + 1 individual signatures? That would make the message slightly larger, but that shouldn't be much overhead in practice (it contains a whole block/batch, after all), and threshold signatures are probably a lot slower than individual signatures. Actually, that is another issue I have with the proposal: Honey Badger goes to great lengths (using erasure coding) to avoid very large messages — no message ever contains a whole batch, only individual contributions. |
Even with Reed-Solomon, this is still wasteful if we do it as soon as we move to a new epoch. Then the first node to finish would always send a redundant message of size B / (N - f) to everyone else. On the other hand, if we wait longer we make it even harder for a lagging node to catch up. Another potential waste of bandwidth with the above approach is that the first message in each epoch is often our own proposed contribution, which is large. We should avoid repeatedly sending that message just to have the recipient reject it. Instead, each node should tell everyone else whenever they move to a new epoch, so no messages are ever sent too early. The drawback is that it introduces another round of messages. But for that we already have |
And in terms of implementation, I don't see a good way around some duplication: we'll have to implement this for Honey Badger and, maybe in a somewhat simpler form, for Dynamic Honey Badger |
The sender has to send all of its (correct) outgoing messages and the receiver has to receive all correct incoming messages. You are right pointing out the asymmetry: the correctness check is implemented only on the receiver. However we cannot remove that check and it remains present in both cases, whether the buffering is done on the sender or on the receiver.
Do you mean growing a chain from 1 to f + 1 individual signatures by rebroadcasting the batch and appending one signature not already in the chain on every batch message rebroadcast? I think it is less efficient since the worst-case number of broadcasts for a perfect network is |
The asymmetry is that the receiver has to cache all the messages it cannot handle yet, including the spam, because it doesn't know yet which of them are correct. The sender knows that all its outgoing messages are correct, and won't spam its own outgoing queue.
No, we wouldn't send a chain: just our own signature to the lagging node. The lagging node could then count the signatures it received, and when it has f + 1, it could consider the batch valid and move on to the next epoch. (In the error code variant, it would actually have to wait for N - f messages before it could reconstruct the batch.) |
This is a fair point about keeping all the messages. You mentioned the message which a sender uses to announce epoch increment. Where does a new validator start? Should it queue all messages to a node until it receives an About |
Good point: I'm afraid the new validator would have to wait for everyone else's I agree that duplicating the batch f + 1 times would be wasteful. That's why I'd be in favour of using Reed-Solomon instead. And now that I think of it, a threshold of N - f wouldn't be good enough: A correct node could be lagging, and would only receive N - f - 1 batches. In fact, up to f correct nodes could lag. So we'd need a threshold of N - 2 f - 1 (or just f). |
Definitely, we should try sending parts of the batch. Especially if those parts can be uniquely assembled into the batch. That would solve all the problems. |
Let's also compare with the current Bounded Badger plan and make sure we're not overlooking anything. |
With #308, the larger part of this has been done and DHB/QHB/HB's incoming queues are gone. Meanwhile, the other half has become more complex: |
We need to ensure that an attacker can't fill up a node's memory by sending it lots of messages that it stores forever, particularly in the algorithms that proceed in epochs (agreement, Honey Badger, …) and that, to be fully asynchronous, would in theory need to store any information for any future epoch until they can process it.
It's tempting to enforce some reasonable limits instead, and just assume that we'll never fall behind by more than 1000 epochs. But then an attacker could break consensus by sending a message that is exactly 1000 epochs ahead and will be handled by only some of the honest nodes.
Either way, we should try to limit the amount of memory per epoch: E.g. in agreement,
incoming_queue
should be replaced by just the information which of the other nodes sent which of the four possible messages:BVal(true), BVal(false), Aux(true), Aux(false)
.The text was updated successfully, but these errors were encountered: