Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache2 and NSCA not removing PID on reboot #139

Open
AaronAutomation opened this issue Aug 12, 2022 · 4 comments
Open

Apache2 and NSCA not removing PID on reboot #139

AaronAutomation opened this issue Aug 12, 2022 · 4 comments

Comments

@AaronAutomation
Copy link

Web access to nagios goes down after resetting my server. Logs show,
httpd (pid 18) already running
nsca[20039]: There's already an NSCA server running (PID 17). Bailing out...
Removing those PIDs manually in the docker container fixes it until the next reboot.

@AaronAutomation
Copy link
Author

Solved this issue and others I was having by rolling back to v4.4.4

@tronyx
Copy link

tronyx commented Feb 27, 2023

I believe the issues you were seeing were resolved in 4.4.8 which this image is currently using.

@Innsai
Copy link

Innsai commented May 25, 2023

4.4.8 still does:
nsca[182]: There's already an NSCA server running (PID 33). Bailing out...
Maybe a clue in the syslog:
nsca[34]: Cannot remove pidfile '/var/run/nsca.pid' - check your privileges.
@tronyx

@gurubobnz
Copy link

gurubobnz commented May 28, 2024

I see this issue from time to time.

nagios_1  | nsca[1727]: There's already an NSCA server running (PID 236).  Bailing out...
nagios_1  | nsca[1728]: There's already an NSCA server running (PID 236).  Bailing out...
nagios_1  | nsca[1729]: There's already an NSCA server running (PID 236).  Bailing out...

(repeated)

The nagios web UI was up and running, and in the container the /var/run/nsca.pid file was present and had a PID in it of the existing running process. I guess something is trying to launch another instance of NSCA and is failing with that message. Here's the PID file contents and currently running processes, including the /bin/bash as root that I used to get into the container.

root@68b427b3ea3f:/var/run# ls -la 
total 40
drwxr-xr-x 1 root   root   4096 May 19 10:26 .
drwxr-xr-x 1 root   root   4096 Jan 30 23:17 ..
drwxr-xr-x 1 root   root   4096 May 19 10:26 apache2
drwxrwxrwt 1 root   root   4096 Jan  5 22:46 lock
drwxr-xr-x 2 root   root   4096 Dec 12 03:04 mount
-rw-r--r-- 1 nagios nagios    4 May 18 20:42 nsca.pid
root@68b427b3ea3f:/var/run# cat nsca.pid 
236
root@68b427b3ea3f:/var/run# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   4356    40 ?        Ss   May19   0:00 /bin/bash /usr/local/bin/start_nagios
root       228  0.0  0.0   2804    28 ?        S    May19   0:10 runsvdir -P /etc/service
root       229  0.0  0.0   2652   320 ?        Ss   May19   0:00 runsv postfix
root       230  0.0  0.0   2652   308 ?        Ss   May19   0:00 runsv rsyslog
root       231  0.0  0.0   2652   328 ?        Ss   May19   0:00 runsv apache
root       232  0.0  0.0   2652   328 ?        Ss   May19   0:00 runsv nagios
root       233  0.0  0.0   2652   472 ?        Ss   May19  13:54 runsv nsca
root       234  0.0  0.0  41224   848 ?        S    May19   0:04 /usr/lib/postfix/sbin/master -d -c /etc/postfix
nagios     235  0.0  0.0  62680  2396 ?        S    May19   1:38 /opt/nagios/bin/nagios /opt/nagios/etc/nagios.cfg
root       236  0.0  0.0 206372   704 ?        Ss   May19   0:27 /usr/sbin/apache2 -D NO_DETACH
root       237  0.0  0.0 152428   844 ?        Sl   May19   2:28 rsyslogd -n -f /etc/rsyslog.conf
nagios     245  0.0  0.0  34540  1364 ?        S    May19   4:39 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
nagios     246  0.0  0.0  34540  1352 ?        S    May19   5:11 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
nagios     247  0.0  0.0  34540  1340 ?        S    May19   4:37 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
nagios     248  0.0  0.0  34540  1352 ?        S    May19   5:12 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
nagios     249  0.0  0.0  34540  1360 ?        S    May19   5:08 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
nagios     250  0.0  0.0  34540  1276 ?        S    May19   4:34 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
nagios     251  0.0  0.0  60936    40 ?        S    May19   0:42 /opt/nagios/bin/nagios /opt/nagios/etc/nagios.cfg
nagios     257  0.0  0.0 206580  3668 ?        S    May19   0:00 /usr/sbin/apache2 -D NO_DETACH
nagios     258  0.0  0.0 206596  3708 ?        S    May19   0:00 /usr/sbin/apache2 -D NO_DETACH
nagios     259  0.0  0.0 206580  3584 ?        S    May19   0:00 /usr/sbin/apache2 -D NO_DETACH
nagios     260  0.0  0.0 206580  3632 ?        S    May19   0:00 /usr/sbin/apache2 -D NO_DETACH
postfix    263  0.0  0.0  41364  1564 ?        S    May19   0:01 qmgr -l -t unix -d -u
nagios     643  0.0  0.1 206580  4208 ?        S    11:05   0:00 /usr/sbin/apache2 -D NO_DETACH
nagios     648  0.0  0.1 206580  4204 ?        S    11:05   0:00 /usr/sbin/apache2 -D NO_DETACH
nagios     650  0.0  0.1 206580  4104 ?        S    11:05   0:00 /usr/sbin/apache2 -D NO_DETACH
nagios     651  0.0  0.1 206604  4240 ?        S    11:05   0:00 /usr/sbin/apache2 -D NO_DETACH
nagios     685  0.0  0.1 206580  3928 ?        S    11:06   0:00 /usr/sbin/apache2 -D NO_DETACH
root      1752  0.0  0.0   4620  3836 pts/0    Ss   11:17   0:00 /bin/bash
root      1798  0.0  0.0   7056  1544 pts/0    R+   11:17   0:00 ps aux
nagios    4038  0.0  0.2 206580  8804 ?        S    May19   0:00 /usr/sbin/apache2 -D NO_DETACH
postfix  27212  0.0  0.1  41244  6440 ?        S    09:53   0:00 pickup -l -t unix -d -u -c

The dates on the PID (May 18) don't match with what I assume is the start time of the process (May 19). This might be a hint.

I removed the container and recreated it and this problem went away. I thought it might have been triggered by restarting the container, but restarting it worked fine. I wonder actually if this is caused by an unclean shutdown of the container, which would leave the PID file there, followed by a subsequent restart?

Version: latest, image hash 79a7fc3a2f88 (https://hub.docker.com/layers/jasonrivers/nagios/latest/images/sha256-a341182a89e6888c27cc283ca22e36b9f9ebd96deaa4b76063bdaeb8f025a16d?context=explore)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants