#1897 closed defect (wontfix)
Systemd is unable to read pidfile after nginx start-up due to a race-condition
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | other | Version: | 1.14.x |
Keywords: | Cc: | ||
uname -a: | Linux n1 4.19.42-204.el7fc.x86_64 #1 SMP Tue May 14 21:31:24 +10 2019 x86_64 x86_64 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.14.1
built by gcc 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC) built with OpenSSL 1.0.2k-fips 26 Jan 2017 TLS SNI support enabled configure arguments: --prefix=/opt/nginx-1.14.1 --pid-path=/var/run/nginx.pid --http-client-body-temp-path=/tmp/nginx-client_body_temp --http-proxy-temp-path=/tmp/nginx-proxy_temp --http-fastcgi-temp-path=/tmp/nginx-fastcgi_temp --http-uwsgi-temp-path=/tmp/nginx-uwsgi_temp --http-scgi-temp-path=/tmp/nginx-scgi_temp --with-pcre --with-pcre-jit --with-threads --with-http_ssl_module --with-http_v2_module --with-http_stub_status_module --with-http_gzip_static_module --with-http_gunzip_module --with-http_realip_module --with-stream --with-stream_ssl_module --with-http_geoip_module=dynamic |
Description
Systemd unit nginx.service (packaged in your rpms but that same setup is used almost everywere):
... [Service] Type=forking PIDFile=/var/run/nginx.pid ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf ExecReload=/bin/kill -s HUP $MAINPID ExecStop=/bin/kill -s TERM $MAINPID ...
The problem is that $MAINPID
sometimes is empty, so that a "reload" or a "stop" operation won't work.
$MAINPID
is populated from PIDFile (see man systemd.service
) and here we have a race condition:
- parent nginx process does fork and terminates, at this moment systemd tries to read PIDFile which is empty (because forked nginx child hasn't written pid file yet)
- forked nginx child does write a pid value to PIDFile but that happens too late
This bug is also present in Ubuntu bug-tracker (https://bugs.launchpad.net/ubuntu/+source/nginx/+bug/1581864).
There is a work-around - to insert a small delay:
ExecStartPost=/bin/sleep 0.1
it does alleviate the problem. The real solution suggested in Ubuntu bug-tracker is to write pidfile in parent nginx process after fork but before exiting - this closes the race condition.
Will you, please, port suggested solution to the mainline codebase or work something along the lines to solve this problem?
Change History (4)
comment:1 by , 5 years ago
comment:2 by , 5 years ago
I have to add, that we also have this problem but we only see it with an reload.
But it seems (not 100% sure) that we don't had this problem on RHEL 7.6 but only on RHEL 7.7. So it might be systemd related to:
- RHEL 7.6: 219-62.el7
- RHEL 7.7: 219.67.el7
We are gonna try to add the systemd sleep fix for now.
comment:3 by , 5 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
This was discussed in details in this thread (in Russian):
http://mailman.nginx.org/pipermail/nginx-ru/2017-November/060604.html
Notably (mostly summarised here):
- The message in question is harmless.
- The way systemd requires daemons to work with Type=forking is not compatible with how daemons are written historically, notably with using daemon(3) functions.
- Things should be fixed once systemd implements Type=pid-file, which is in todo.
Additional notes:
- The patch to synchronize things as recommended by systemd's daemon(7) manpage can be found in this message. It wasn't committed as the change is considered too complex for the problem it solves.
- There was an attempt to submit a sleep() patch, which was rejected, see here.
- The patch suggested in the Ubuntu bugtracker to write pidfile in the parent process looks wrong. In particular, it does so with the wrong umask, which is set in the child process only.
Hope this helps.
So... Does anyone care?