Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(gossipsub): gracefully disable handler on stream errors #3625
fix(gossipsub): gracefully disable handler on stream errors #3625
Changes from 17 commits
1264345
3c2fbce
b8fed53
f4cfbc3
e7e96ed
e37ba58
b6be9ce
1e06367
415f648
f87949d
9e12f9d
ee6cb02
3443a69
fef9751
7dec223
3163213
e28af53
0507493
b572895
12e9b53
fd4958d
44dce05
6a5f1d0
3432ac0
db59d23
b94ec28
c5e3c41
c02a3a3
e94c2c7
a7ed378
798ef5c
bbdf8f5
af21589
f999f3e
397afa2
9f44adc
b01e86f
552cb08
b42e71e
a958b60
d673ed2
7cb4e41
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding an
assert!
ordebug_assert!
(preference for the former) here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a closer look, I don't think this invariant actually holds.
In
on_connection_event
we drop new inbound requests on==
:Thus
self.inbound_substreams
may never be>
but it could be==
and thus the invariant of<
does not hold.@thomaseizinger am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logs for errors on outbound substreams have all been downgraded to
debug
while the inbound ones are kept aswarn
. I would expect thepub
part of gossipsub to be as important (if not more) as thesub
part to keep these all aswarn
at leastThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, a
warn
always means "Should I wake an ops person at 3am because of this"? Connections can die at any time because somebody e.g. closes their laptop. That is not a reason to wake an ops person IMO, hence I am gonna downgrade the inbound streams to debug instead upgrading the outbound ones to warn.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be a warn, error or a crit. In principle this should never happen and we don't want the connection to just end silently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thinking was that
info!
is typically used for a state change (we are disabling the handler). I don't want to wake an ops person at 3am because we are emittingwarn
orerror
here :)Happy to be convinced otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not all that concerned. I;d be curious to know when to use a warn then. I'd typically use a warn if something has not behaved as it should and is worth notifying a user.
In this case, we pretty much should never see this log, unless something is pretty broken. I find it handy to grep warn/error etc to find things that are broken. This fits into that classification imo.
But if other parts of libp2p are not doing it, happy to leave as an info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we never see this log? We would see this if we connect to a node that doesn't support the gossipsub protocol right? Nothing is inherently broken if that happens so I am not sure a
warn
is appropriate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that this should rather be a
warn
instead of ainfo
.By default
env-logger
does not log oninfo
level. I would argue that users should see this by default, thus it should be a level higher thaninfo
which iswarn
.Agreed.
As far as I can tell a
UpgradeError::Select(NegotiationError::Failed)
would lead toself.protocol_unsupported = true
and thus a direct disabling (i.e.self.keep_alive = KeepAlive::No
) in the nextConnectionHandler::poll
invocation. Thus we would never request another outbound stream and thus the log line above is never printed. Am I missing something @thomaseizinger?Just to avoid confusion, I am in favor of the above behavior. I don't think there is value in directly requesting another stream in case the remote signaled that it does not support the protocol on the previous stream. Objections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only do the negotiation once to create long-lived substreams. If the negotiation times out, I rarely find it useful to continually re-connect and retry the negotiations, which is why I previously opted to just close the connection.
If I understand these changes, every message we want to send, we are going to continually try and re-negotiate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also consider this a "hard error" for gossipsub and immediately disable the handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we'd try
MAX_SUBSTREAM_CREATION
times and then disable the handler.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that seems like reasonable logic.
If there were another behaviour that had a keepalive::Yes. It would continue to try and reconnect tho right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. We increase
outbound_substream_established
only on successful substream negotiation (i.e. inon_fully_negotiated_outbound
). The above failure does not increase the counter and thus the limitMAX_SUBSTREAM_CREATION
is never hit.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are totally right, not sure how I missed that. Do we agree that the logic I described is what we want to have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed in case it includes the re-enqueuing of the RPC of the failed outbound stream. See #3625 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I think that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking the number of requested outbound streams instead of the number of successfully upgraded outbound streams is done with 0507493.