Opened 9 years ago
Closed 4 years ago
#753 closed defect (fixed)
Nginx leaves UNIX domain sockets after SIGQUIT
Reported by: | launchpad.net/~cpburnz | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | nginx-core | Version: | 1.6.x |
Keywords: | nginx 1.6 1.8 sigquit sigterm unix domain socket | Cc: | |
uname -a: |
Linux test1 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
|
||
nginx -V: |
nginx version: nginx/1.6.3
TLS SNI support enabled configure arguments: --with-cc-opt='-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_geoip_module --with-http_gzip_static_module --with-http_image_filter_module --with-http_spdy_module --with-http_sub_module --with-http_xslt_module --with-mail --with-mail_ssl_module --add-module=/build/buildd/nginx-1.6.3/debian/modules/nginx-auth-pam --add-module=/build/buildd/nginx-1.6.3/debian/modules/nginx-dav-ext-module --add-module=/build/buildd/nginx-1.6.3/debian/modules/nginx-echo --add-module=/build/buildd/nginx-1.6.3/debian/modules/nginx-upstream-fair --add-module=/build/buildd/nginx-1.6.3/debian/modules/ngx_http_substitutions_filter_module nginx version: nginx/1.8.0 built by gcc 4.8.2 (Ubuntu 4.8.2-19ubuntu1) built with OpenSSL 1.0.1f 6 Jan 2014 TLS SNI support enabled configure arguments: --with-http_ssl_module |
Description
According to the Nginx documentation, SIGQUIT will cause a "graceful shutdown" while SIGTERM will cause a "fast shutdown". If you send SIGQUIT to Nginx, it will leave behind stale UNIX domain socket files that were created using the listen directive. If there are any stale UNIX domain socket files when Nginx starts up, it will fail to listen on the socket because it already exists. However if you use SIGTERM, the UNIX domain socket files will be properly removed. I've encountered this with Nginx 1.6.2, 1.6.3, and 1.8.0 on Ubuntu 14.04.
Example /etc/nginx/nginx.conf
:
http { ## # Basic Settings ## sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; include /etc/nginx/mime.types; default_type application/octet-stream; ## # Logging Settings ## access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; ## # Gzip Settings ## gzip on; gzip_disable "msie6"; ## # Virtual Host Configs ## include /etc/nginx/sites-enabled/*; }
Example /etc/nginx/sites-enabled/serve-files
:
server { listen unix:/run/serve-files.socket; root /var/www/files; location / { try_files $uri =404; } }
Then start Nginx:
sudo nginx # OR sudo service nginx start
On first start, /run/serve-files.socket
will be created because of the listen unix:/run/serve-files.socket;
directive.
Then stop Nginx with SIGQUIT:
sudo kill -SIGQUIT $(cat /run/nginx.pid) # OR sudo service nginx stop # Sends SIGQUIT
The socket at /run/serve-files.socket
will remain because it was not properly removed. If you try to restart Nginx, it will fail to start with the following logged to /var/log/nginx/error.log
:
2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use) 2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use) 2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use) 2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use) 2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use) 2015/04/24 10:16:27 [emerg] 5782#0: still could not bind()
Change History (15)
comment:1 by , 9 years ago
Status: | new → accepted |
---|
comment:2 by , 9 years ago
Encountered the same trouble with Nginx 1.9.1 on Ubuntu 14.04 and FreeBSD 10.1-RELEASE.
Please, consider fixing this ASAP.
Thanx.
comment:3 by , 9 years ago
Maxim Dounin, can you please let us know if this going to be implemented in a near future? thanx
comment:5 by , 9 years ago
This problem should never affect any properly working nginx installation (there is no reason to stop nginx unless you are stopping the host itself), so it's unlikely to be looked into in a near future. Moreover, the fix is likely to affect other use cases (e.g., binary upgrade) and should be done with care, if at all.
comment:6 by , 9 years ago
So if for any reason I would decide to stop, deinstall and later install and start nginx without server reboot using some automation software like SlatStack, Chef etc, when I get broken node (or plenty of them in the case of mass reconfigurations) I should remember that the reason is "there is no reason to stop nginx unless you are stopping the host itself"? %-\
Anyway thanx for the answer.
comment:7 by , 8 years ago
I can confirm that a SIGQUIT will not remove the socket, and a SIGTERM will.
However, I disagree with what mdounin says about this never affecting anyone. On a debian system, if you stop or restart nginx using the init script, you *do* run into this problem.
The initscript runs in its stop function:
start-stop-daemon --stop --quiet --retry=$STOP_SCHEDULE --pidfile $PID --name $NAME
The --stop argument will send a SIGTERM to all matching processes. However its possible to send a different signal with the --signal option, or the --retry option, and it seems like it is sending this option as $STOP_SCHEDULE which is by default set to 'QUIT/5' for systemd, and STOP_SCHEDULE="${STOP_SCHEDULE:-QUIT/5/TERM/5/KILL/5}" for the initscript.
in its restart function, it simply runs a stop() and then start().
However, a reload() will work ok for reloading the config.
This means that anyone doing a 'graceful shutdown' wont have their socket cleaned up. Most people would want to do a graceful shutdown before doing a 'fast' shutdown, no?
comment:8 by , 8 years ago
It also should be noted that using SIGQUIT for process termination is a bit weird for process terminating:
SIGQUIT 3 Core Quit from keyboard SIGTERM 15 Term Termination signal
most applications don't bother intercepting SIGQUIT.
comment:10 by , 5 years ago
Still exists on 1.16.0.
This is also preventing me from restarting nginx via systemctl.
comment:11 by , 5 years ago
Ran into this issue myself. Here is a patch I just proposed to the nginx-devel mailing list that resolves it:
# HG changeset patch # User Thibault Charbonnier <thibaultcha@me.com> # Date 1582764433 28800 # Wed Feb 26 16:47:13 2020 -0800 # Node ID 55ea1a9197a6f28d4da00909e5ea8585f6a08239 # Parent 4f18393a1d51bce6103ea2f1b2587900f349ba3d Ensured SIGQUIT deletes listening UNIX socket files. Prior to this patch, the SIGQUIT signal handling (graceful shutdown) did not remove UNIX socket files since ngx_master_process_cycle reimplemented listening socket closings in lieu of using ngx_close_listening_sockets. Since ngx_master_process_exit will call the aforementioned ngx_close_listening_sockets, we can remove the custom implementation and now expect listening sockets to be closed properly by ngx_close_listening_sockets instead. This fixes the trac issue #753 (https://trac.nginx.org/nginx/ticket/753). diff -r 4f18393a1d51 -r 55ea1a9197a6 src/os/unix/ngx_process_cycle.c --- a/src/os/unix/ngx_process_cycle.c Thu Feb 20 16:51:07 2020 +0300 +++ b/src/os/unix/ngx_process_cycle.c Wed Feb 26 16:47:13 2020 -0800 @@ -77,12 +77,11 @@ u_char *p; size_t size; ngx_int_t i; - ngx_uint_t n, sigio; + ngx_uint_t sigio; sigset_t set; struct itimerval itv; ngx_uint_t live; ngx_msec_t delay; - ngx_listening_t *ls; ngx_core_conf_t *ccf; sigemptyset(&set); @@ -205,16 +204,6 @@ ngx_signal_worker_processes(cycle, ngx_signal_value(NGX_SHUTDOWN_SIGNAL)); - ls = cycle->listening.elts; - for (n = 0; n < cycle->listening.nelts; n++) { - if (ngx_close_socket(ls[n].fd) == -1) { - ngx_log_error(NGX_LOG_EMERG, cycle->log, ngx_socket_errno, - ngx_close_socket_n " %V failed", - &ls[n].addr_text); - } - } - cycle->listening.nelts = 0; - continue; }
comment:12 by , 5 years ago
Review can be seen here:
http://mailman.nginx.org/pipermail/nginx-devel/2020-February/013017.html
Just for the record, here are links to the previous attempt to fix this and the review:
http://mailman.nginx.org/pipermail/nginx-devel/2016-December/009207.html
http://mailman.nginx.org/pipermail/nginx-devel/2016-December/009208.html
http://mailman.nginx.org/pipermail/nginx-devel/2016-December/009239.html
comment:13 by , 4 years ago
This should be prioritised, it is an issue for example when running nginx in kubernetes.
https://github.com/nginxinc/docker-nginx/issues/167
https://github.com/nginxinc/docker-nginx/issues/377
comment:14 by , 4 years ago
Just a further comment to try and raise the profile of this issue. It's one I've been tracking for some time and face on a regular basis.
comment:15 by , 4 years ago
Owner: | set to |
---|---|
Resolution: | → fixed |
Status: | accepted → closed |
The ngx_master_process_cycle() closes listening sockets by itself on SIGQUIT, without using the ngx_close_listening_sockets() function which is capable of correctly removing unix sockets after closing them. Looks like it needs to be changed to use ngx_close_listening_sockets() instead.