Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#424 closed defect (invalid)

PID file race condition

Reported by: www.google.com/accounts/o8/id?id=AItOawkACEKQxqlKlbuqmg6oluxprlLFjXHU2zg Owned by:
Priority: major Milestone:
Component: nginx-core Version: 1.2.x
Keywords: Cc:
uname -a: Linux 3.8.13 #26 SMP Mon Jun 24 16:08:53 EDT 2013 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.2.9
TLS SNI support enabled
configure arguments: --prefix=/usr/local --pid-path=/var/run --lock-path=/var/run --with-file-aio --with-ipv6 --with-rtsig_module --with-http_addition_module --with-http_xslt_module --with-http_image_filter_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_stub_status_module --with-http_realip_module --with-http_secure_link_module --with-http_ssl_module --with-http_geoip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-mail --with-mail_ssl_module --with-pcre

Description

Hello,
I've been observing a small race condition in PID file handling that can cause a world of trouble sometimes.

When an Nginx restart is performed (ie. sending QUIT to the nginx master process) the old master process can remain active for some time until all the active connections on it close at which point it exits. But when it does so it deletes the PID file which no longer belongs to it, it belongs to the new nginx master process which was spawned and already wrote *its own PID* into the file.

This small problem leads to situations like:

  • failure to restart Nginx in the future if you try to send signals to PID written in a file which no longer exists
  • failure to send signals (USR1, HUP) for log rotation purposes, leading to file-descriptors remaining open on huge files which no longer exist

With Nginx being such an amazing enterprise software I expected to find a bug report already made, but I just couldn't locate any mentions of this anywhere.

Some relevant information about software last seen to have this problem:

  • nginx.conf

worker_processes 24;
pid /var/run/nginx.pid;

~# nginx -V
Pasted into the form below.

Thank you.

Change History (5)

comment:1 by Maxim Dounin, 11 years ago

Resolution: invalid
Status: newclosed

The problem is that you start new nginx instance before old nginx instance was terminated. It's not something allowed - you have to wait for an old master process to exit before starting a new one.

comment:2 by www.google.com/accounts/o8/id?id=AItOawkACEKQxqlKlbuqmg6oluxprlLFjXHU2zg, 11 years ago

Hello, but if that is done then let's say load-balancers will no longer be able to reach the old instance while it is closing down (and if large uploads or downloads are in progress on old Nginx master instance that can take a very long time), so they will proceed to fail out this server from the production pool.

It doesn't seem very efficient to wait, or very available when talking about high-availability and one second restarts before health checks have time to fail.

comment:3 by is, 11 years ago

For what purpose you want to stop nginx and then to start it again?

If you need to reload configuration, just send HUP.
If you need to rotate logs, just send USR1.
If you need to upgrade nginx binary, send USR2.

Details at http://nginx.org/en/docs/control.html

comment:4 by www.google.com/accounts/o8/id?id=AItOawkACEKQxqlKlbuqmg6oluxprlLFjXHU2zg, 11 years ago

Hello is, for adding new SSL certificates, for adding new proxy (or fastcgi) cache pools and for some other things often times I found only a restart will activate the new configuration, and it is quite unpredictable so that I could never make a definite map of what will force me to a full restart.

Now if you say _that_ is the true bug I hit, I fully understand. But I will still think smarter PID file handling is the simpler and more logical improvement for the software running most of the top sites.

comment:5 by is, 11 years ago

HUP is enough for adding SSL certificate and cache pool.

This is not the bug, but intended behaviour: if a master process is still running, PID file should contain its PID.

Note: See TracTickets for help on using tickets.