Opened 5 years ago

Closed 5 years ago

#1706 closed defect (invalid)

Orphan processes after fatal signals

Reported by: jiazhouyang09@… Owned by:
Priority: minor Milestone:
Component: other Version:
Keywords: Cc:
uname -a: Linux ubuntu 4.13.0-39-generic #44~16.04.1-Ubuntu SMP Thu Apr 5 16:43:10 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.14.0 (Ubuntu)
built with OpenSSL 1.1.0g 2 Nov 2017
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/build/nginx-mcUg8N/nginx-1.14.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module

Description

Hi,

I'm a PhD student, and I'm studying how software handles signals.

I found Nginx would have orphan processes when issued some fatal signals (including SIGILL, SIGABRT, SIGBUG, SIGFPE, SIGSEGV) to the main process. These signals might be triggered by some unknown bugs. In this case, Nginx could neither start nor stop:

$ sudo nginx
nginx: [emerg] bind() to *ip*:80 failed (98: Address already in use)

$ sudo nginx -s quit
nginx: [alert] kill(*pid*, 3) failed (3: No such process)

Also, any web server using the same port will fail to start. To fix this, users have to kill the orphan processes manually.

I'm not sure if this is a bug. But I think it will be better if Nginx can exit gracefully. At least, do not corrupt the operating system.

Best,
Zhouyang

Change History (1)

comment:1 by Maxim Dounin, 5 years ago

Resolution: invalid
Status: newclosed

Thank you for your feedback. Doing anything in case of fatal signals within the same process is believed to be bad idea, as it is likely to cause more harm than good, as one cannot expect anything about the process state and memory contents if a signal happens. Additionally, such handling is likely to make obtaining core dumps and further debugging harder.

In practice, in the past I've seen an important service dead and consuming 100% CPU for several hours (till manual intervention) because authors tried to intercept and additionally debug SIGSEGV signals, and due to memory corruption this resulted in an infinite loop in the signal handler. If the code did nothing instead, the service would be properly restarted by the startup script, resulting in much less clients affected by the bug.

As such, nginx does not try to handle fatal signals within the process itself. It is, however, uses multi-process model with multiple worker processes, and hence it is able to restart worker processes terminated due to fatal signals - new workers are started by the master process, which is in a known good state. But if a fatal signal happens in the master process itself, it is simply terminated by the OS as per default signal handling.

Note though that nginx operations are not affected even if master process dies, since worker processes still work and handle requests. The only affected functionality is the one related to configuration changes - such as configuration reloads which imply restarting worker processes, and starting and stopping nginx itself.

While it probably possible to make nginx more resilient to arbitrary bugs, I don't think there is an obvious solution. And, clearly enough, exiting gracefully in case of fatal signals in the master process does not look like a good solution, as a) this will also stop nginx from serving requests, while currently it can serve requests just fine even if master process is dead, and b) it is not something really possible to do without intercepting fatal signals in the same process, and this is clearly an unsafe thing to do, as explained above.

Note: See TracTickets for help on using tickets.