-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync-server: Fix infinite loop caused by accept error #271
base: master
Are you sure you want to change the base?
sync-server: Fix infinite loop caused by accept error #271
Conversation
src/sync/server.rs
Outdated
@@ -373,6 +374,9 @@ impl Server { | |||
} | |||
Err(e) => { | |||
error!("listener accept got {:?}", e); | |||
// Since poll is level-triggered, an uncorrected error can lead to an infinite loop, | |||
// so we sleep for a while and wait for the error to be corrected. | |||
thread::sleep(Duration::from_secs(10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 seconds seem like a long time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, It's a long time for a program, but the errors of accept such as EMFILE and ENOMEM can't be recovered in a short time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I could add a method server.set_accept_retry_interval, so users can customize the sleep duration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @jsturtevant I have added the method set_accept_retry_interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making it configurable.
If it wasn't resolved in 10s wouldn't the continue loop still get stuck? In either case, would EMFILE is eventually resolved, wouldn't the loop finish?
Is the sleep to resolve cpu cycles while those errors occur? Should this only be done for a subclass of errors? seems like if there was a transient error we would slow things down significantlly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the code. Now the sleep will only affects those resource limit errors: libc::EMFILE, libc::ENFILE, libc::ENOBUFS, libc::ENOMEM.
a845820
to
47f828a
Compare
Also Add the method set_accept_retry_interval. Since poll is level-triggered, an uncorrected error can lead to an infinite loop, so we sleep for a while and wait for the error to be corrected. Fixes: containerd#270 Signed-off-by: Tim Zhang <[email protected]>
47f828a
to
c351f0b
Compare
Since poll is level-triggered, an uncorrected error can lead to an infinite loop, so we sleep for a while and wait for the error to be corrected.
Fixes: #270