PID file race condition
|Reported by:||www.google.com/accounts/o8/id?id=AItOawkACEKQxqlKlbuqmg6oluxprlLFjXHU2zg||Owned by:|
|uname -a:||Linux 3.8.13 #26 SMP Mon Jun 24 16:08:53 EDT 2013 x86_64 GNU/Linux|
nginx version: nginx/1.2.9
TLS SNI support enabled
configure arguments: --prefix=/usr/local --pid-path=/var/run --lock-path=/var/run --with-file-aio --with-ipv6 --with-rtsig_module --with-http_addition_module --with-http_xslt_module --with-http_image_filter_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_stub_status_module --with-http_realip_module --with-http_secure_link_module --with-http_ssl_module --with-http_geoip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-mail --with-mail_ssl_module --with-pcre
I've been observing a small race condition in PID file handling that can cause a world of trouble sometimes.
When an Nginx restart is performed (ie. sending QUIT to the nginx master process) the old master process can remain active for some time until all the active connections on it close at which point it exits. But when it does so it deletes the PID file which no longer belongs to it, it belongs to the new nginx master process which was spawned and already wrote *its own PID* into the file.
This small problem leads to situations like:
- failure to restart Nginx in the future if you try to send signals to PID written in a file which no longer exists
- failure to send signals (USR1, HUP) for log rotation purposes, leading to file-descriptors remaining open on huge files which no longer exist
With Nginx being such an amazing enterprise software I expected to find a bug report already made, but I just couldn't locate any mentions of this anywhere.
Some relevant information about software last seen to have this problem:
~# nginx -V
Pasted into the form below.