Opened 6 weeks ago

Closed 10 days ago

#1897 closed defect (wontfix)

Systemd is unable to read pidfile after nginx start-up due to a race-condition

Reported by: turchanov@… Owned by:
Priority: minor Milestone:
Component: other Version: 1.14.x
Keywords: Cc:
uname -a: Linux n1 4.19.42-204.el7fc.x86_64 #1 SMP Tue May 14 21:31:24 +10 2019 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.14.1
built by gcc 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)
built with OpenSSL 1.0.2k-fips 26 Jan 2017
TLS SNI support enabled
configure arguments: --prefix=/opt/nginx-1.14.1 --pid-path=/var/run/nginx.pid --http-client-body-temp-path=/tmp/nginx-client_body_temp --http-proxy-temp-path=/tmp/nginx-proxy_temp --http-fastcgi-temp-path=/tmp/nginx-fastcgi_temp --http-uwsgi-temp-path=/tmp/nginx-uwsgi_temp --http-scgi-temp-path=/tmp/nginx-scgi_temp --with-pcre --with-pcre-jit --with-threads --with-http_ssl_module --with-http_v2_module --with-http_stub_status_module --with-http_gzip_static_module --with-http_gunzip_module --with-http_realip_module --with-stream --with-stream_ssl_module --with-http_geoip_module=dynamic

Description

Systemd unit nginx.service (packaged in your rpms but that same setup is used almost everywere):

...

[Service]
Type=forking
PIDFile=/var/run/nginx.pid
ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID

...

The problem is that $MAINPID sometimes is empty, so that a "reload" or a "stop" operation won't work.
$MAINPID is populated from PIDFile (see man systemd.service) and here we have a race condition:

  1. parent nginx process does fork and terminates, at this moment systemd tries to read PIDFile which is empty (because forked nginx child hasn't written pid file yet)
  2. forked nginx child does write a pid value to PIDFile but that happens too late

This bug is also present in Ubuntu bug-tracker (https://bugs.launchpad.net/ubuntu/+source/nginx/+bug/1581864).
There is a work-around - to insert a small delay:

ExecStartPost=/bin/sleep 0.1

it does alleviate the problem. The real solution suggested in Ubuntu bug-tracker is to write pidfile in parent nginx process after fork but before exiting - this closes the race condition.

Will you, please, port suggested solution to the mainline codebase or work something along the lines to solve this problem?

Change History (3)

comment:1 by turchanov@…, 11 days ago

So... Does anyone care?

comment:2 by rodehoed@…, 11 days ago

I have to add, that we also have this problem but we only see it with an reload.

But it seems (not 100% sure) that we don't had this problem on RHEL 7.6 but only on RHEL 7.7. So it might be systemd related to:

  • RHEL 7.6: 219-62.el7
  • RHEL 7.7: 219.67.el7

We are gonna try to add the systemd sleep fix for now.

Last edited 11 days ago by rodehoed@… (previous) (diff)

comment:3 by Maxim Dounin, 10 days ago

Resolution: wontfix
Status: newclosed

This was discussed in details in this thread (in Russian):

http://mailman.nginx.org/pipermail/nginx-ru/2017-November/060604.html

Notably (mostly summarised here):

  • The message in question is harmless.
  • The way systemd requires daemons to work with Type=forking is not compatible with how daemons are written historically, notably with using daemon(3) functions.
  • Things should be fixed once systemd implements Type=pid-file, which is in todo.

Additional notes:

  • The patch to synchronize things as recommended by systemd's daemon(7) manpage can be found in this message. It wasn't committed as the change is considered too complex for the problem it solves.
  • There was an attempt to submit a sleep() patch, which was rejected, see here.
  • The patch suggested in the Ubuntu bugtracker to write pidfile in the parent process looks wrong. In particular, it does so with the wrong umask, which is set in the child process only.

Hope this helps.

Note: See TracTickets for help on using tickets.