-
-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential race condition in starting nut-server #277
Comments
Which distribution is this, and what version of the NUT package? You might also check if setting Reference: http://networkupstools.org/docs/man/ups.conf.html#_global_directives |
My observations were made on a Ubuntu 14.04.4 LTS system, nut-server: 2.7.1-1ubuntu1. By trial and error, I found that the following settings in my ups.conf appear to result in a relatively reliable performance:
The value of pollinterval has been set to 10 to address the known issue with usbhid-ups (named "Repetitive timeout and staleness") where some UPS models appear to become unresponsive with the default polling frequency, producing errors like:
I am going to experiment a little more with maxretry values. Perhaps it may be possible to avoid adding "sleep 20" to the startup script and increase the value of maxretry instead. |
I have been observing my system for a few weeks now, with maxretry = 30. The startup succeeds from cold boot in more than 50% of cases. In some cases however, upsd does not start even with maxretry = 30. I found that when upsd does not start from cold boot, I have one of the two possible situations:
The following is a part of syslog that describes the situation in (1). Here, upsd failed to start successfully from the cold boot:
What's interesting here is that usbhid-ups reports a successful startup, but then upsd fails to start with "no listening interface available". A subsequent manual restart with "
Could it be the case that upsd fails to start with "no listening interface available" while it takes some time for the usbhid-ups driver to register a listening interface? Lines 293-314 in upsdrvctl.c suggest that if exec_error does not change on an attempt to fork a new process, it is assumed that the driver command has succeeded and then the startup sequence proceeds as if the driver has been fully loaded. However, if the driver still needs some more time to register listening interfaces while upsd assumes that the driver has been loaded (at line 305 in upsdrvctl.c) but then determines that no listening interface is available, upsd will fail to start. I wonder if this is what I am observing with my system. |
Actually, the listening (TCP) interface and the driver are on opposite sides of upsd, architecturally speaking. The driver does not need to start for upsd to start, so there must be something else thwarting upsd. One of the preconditions for upsd (implied, I guess) is that the network interfaces for all of the LISTEN addresses have been completely brought up. Typically, this is done by listing a dependency on the network subsystem in the upsd startup script. ("# Required-Start: $local_fs $syslog $network $remote_fs udev" in /etc/init.d/nut-server?) The logging for upsd socket creation seems to need debug level 3 or greater, and I haven't checked to see if that goes to syslog as well as stderr (setting the debug flags prevents the daemon from backgrounding, which ripples through to other behaviors.) |
@clepple : my observations have been made on upsd v.2.7.1, which is the up-to-date upsd version in Ubuntu 14.04 repos. I am going to upgrade my Ubuntu system from 14.04 to 16.04 shortly and make more observations on what will likely be a newer upsd version. If the issue persists with the newer upsd version, I am willing to clone this upsd repo and compile upsd locally in order to run tests involving debug logging. Lastly, here is the excerpt from my today's syslog. Note that at 09:11:27, I executed " Therefore, it does indeed seem that upsd sometimes misses some dependency at start time. In my particular case, upsd starts properly 6-8 times out of 10. When it does not start properly, I can almost always manually start upsd by running "nut-server start". There was one particular instance when "nut-server start" did not help, but I am unable to reproduce this issue. I will keep observing.
|
But you could try this PPA before building from source: https://code.launchpad.net/~dobey/+recipe/nut-daily https://code.launchpad.net/~dobey/+recipe/nut-daily |
Hi, val-kulkov I have the same symptoms on Raspbian Jessie. It is happening on the single core and multicore version of the RPi board. "no listening interface available" error appears during autostart. |
@matib12 : I am not sure I understand what exactly you are trying to do. The problem, as I understand it, is likely related to the communications between the UPS device and the Linux kernel over the USB. This is the entry from my May 13, 2016 syslog (see a few posts above for the full syslog quote) that suggests that the problem may have its origin in the kernel-device communications:
By the way, the problem completely disappeared once I upgraded my workstation to Ubuntu 16.04. What's more, Ubuntu 16.04 detects my UPS unit as a UPS device automatically and offers power management options for it. There must have been some changes to the kernel code or usbhid-ups or both that fixed the problem. My current kernel version is 4.4.0-45-generic. |
@val-kulkov to be fair, your original posts did describe the A few timeouts ( While Ubuntu 16.04 did seem to run longer before the SMART1500LCDT disconnected, the problem has not gone away completely. I also rebuilt NUT with the libusb-1.0 branch, and pushed packages to this PPA, and it is no worse: https://launchpad.net/~clepple/+archive/ubuntu/nut |
FWIW, this issue is still present nearly three years later. My setup is a bit evolved (Raspberry Pi 3, Raspbian Buster, nut version 2.7.4, upshid-ups driver) but the behavior is identical. On boot-up, the LISTEN directive only works on localhost but not on the static IP. Manually restarting (stop/start) the nut-server service fixes the problem. |
Is #749 a different aspect of this issue in general? These two are not identical, since this issue happens with init scripts and that with (faulty earlier) systemd dependencies, but the systemic root cause seems to be similar. |
It’s certainly possible but I’ve long since decommissioned the environment where I encountered this issue and moved on. I have no way to easily verify whether or not the network was initialized when NUT server attempted to do so. |
I was having the same issues with my Tripp Lite SMART1500LCDT as the ones described in PR #122. Unfortunately, there has been little progress with PR #122 since it was created almost two years ago. So I thought perhaps I should try a different strategy.
I replaced the generic USB cable that came with my Tripp Lite SMART1500LCDT unit with a StarTech Certified USB cable. It worked. Replacing the cable and restarting nut-server resulted in establishing communication with my UPS. I have been watching the connection between my UPS unit and nut-server for a few weeks now and it works reasonably well, losing the connection maybe once a week or so. Compared to what I had with the Tripp Lite-provided generic USB cable, that's a huge progress.
However, I came across another issue. I found that nut-server fails to initialize properly at boot time in about 50% of the cases. Once the computer is booted, a manual restart of nut-server through "/etc/init.d/nut-server restart" fixes the problem.
It looks like there may be a race condition when nut-server is initialized at boot time. Until the appropriate access permissions are set for the USB device by /lib/udev/rules.d/52-nut-usbups.rules, nut-server cannot initialize properly and upsd won't start.
I looked into /etc/init.d/nut-server and found a number of FIXME messages. When I realized that a proper fix would take more time for me than I can afford at this time, I decided to simply describe this issue here hoping that someone will assume responsibility for fixing the issue while I cannot -- at this time.
In my case, adding "sleep 20" into the "start" section of the initialization script fixed the problem. It is very much a Band-Aid solution, of course. But it worked.
The text was updated successfully, but these errors were encountered: