-
Notifications
You must be signed in to change notification settings - Fork 781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High rate of dial failures / peers not connected for request/response on Versi #589
Comments
@vstakhov soft-assigned you, any thoughts? |
Dial failures are possibly related to this: #498 |
After testing master, there are some meaningful differences between peer counts, but the behavior is strange. In scenario 1: 100 async backing nodes (ab), 87 master nodes (cde):
In scenario 2: 187 master nodes (abcde):
No peer is connected to all the other validators. Being unable to connect to 30% of other nodes seems quite stable. While async backing nodes were running, it seems that master-based nodes have a hard time connecting to them, but not vice-versa. Not sure how that works. I'd guess we're dealing with two simultaneous issues:
|
Can it be related to an outdated runtime? I see quite a lot of errors like |
That could be. Let's see how it runs with an updated runtime (using Westend runtime is a bit odd, but as long as the network doesn't go down, no problem). At least worth ruling out any confounding factors. |
It seems that the problems with the network got resolved after libp2p downgrade. |
It seems that this PR has resolved the issue with the upgraded libp2p: paritytech/substrate#14703 |
This is fairly high-priority - could get reviewed & merged today, hopefully? |
We are seeing high rates of request failures and discovery failures on recent master nodes. These appear in the logs as warnings. This may be related to the libp2p upgrade in 01fd49a7fafa01f133e2dec538a2ef7c697a26aa or the logic changes to the discovery system in 1346281e1a12958bb08d5fcf55c7563750719388
Rates of validator connectivity are also lower than expected.
This is possibly only an issue between
master
nodes and paritytech/substrate#5022 nodes, though that is yet to be confirmed.If this is a paritytech/substrate#5022 issue, then the most likely culprit is some misconfiguration in the peer-set: https://github.com/paritytech/polkadot/pull/6782/files#diff-01ac05045f5aef4678a1579846a54002dc6fd86cd6747d4232f0245e04d7ae5d
The text was updated successfully, but these errors were encountered: