Opened 10 years ago

Last modified 5 years ago

#753 accepted defect

Nginx leaves UNIX domain sockets after SIGQUIT

Reported by: launchpad.net/~cpburnz Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.6.x
Keywords: nginx 1.6 1.8 sigquit sigterm unix domain socket Cc:
uname -a: Linux test1 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.6.3
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_geoip_module --with-http_gzip_static_module --with-http_image_filter_module --with-http_spdy_module --with-http_sub_module --with-http_xslt_module --with-mail --with-mail_ssl_module --add-module=/build/buildd/nginx-1.6.3/debian/modules/nginx-auth-pam --add-module=/build/buildd/nginx-1.6.3/debian/modules/nginx-dav-ext-module --add-module=/build/buildd/nginx-1.6.3/debian/modules/nginx-echo --add-module=/build/buildd/nginx-1.6.3/debian/modules/nginx-upstream-fair --add-module=/build/buildd/nginx-1.6.3/debian/modules/ngx_http_substitutions_filter_module



nginx version: nginx/1.8.0
built by gcc 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
built with OpenSSL 1.0.1f 6 Jan 2014
TLS SNI support enabled
configure arguments: --with-http_ssl_module

Description

According to the Nginx documentation, SIGQUIT will cause a "graceful shutdown" while SIGTERM will cause a "fast shutdown". If you send SIGQUIT to Nginx, it will leave behind stale UNIX domain socket files that were created using the listen directive. If there are any stale UNIX domain socket files when Nginx starts up, it will fail to listen on the socket because it already exists. However if you use SIGTERM, the UNIX domain socket files will be properly removed. I've encountered this with Nginx 1.6.2, 1.6.3, and 1.8.0 on Ubuntu 14.04.

Example /etc/nginx/nginx.conf:

http {
    ##
    # Basic Settings
    ##

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    ##
    # Logging Settings
    ##

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ##
    # Gzip Settings
    ##

    gzip on;
    gzip_disable "msie6";

    ##
    # Virtual Host Configs
    ##

    include /etc/nginx/sites-enabled/*;
}

Example /etc/nginx/sites-enabled/serve-files:

server {
    listen unix:/run/serve-files.socket;
    root /var/www/files;
    location / {
        try_files $uri =404;
    }
}

Then start Nginx:

sudo nginx
# OR
sudo service nginx start

On first start, /run/serve-files.socket will be created because of the listen unix:/run/serve-files.socket; directive.

Then stop Nginx with SIGQUIT:

sudo kill -SIGQUIT $(cat /run/nginx.pid)
# OR
sudo service nginx stop # Sends SIGQUIT

The socket at /run/serve-files.socket will remain because it was not properly removed. If you try to restart Nginx, it will fail to start with the following logged to /var/log/nginx/error.log:

2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use)
2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use)
2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use)
2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use)
2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use)
2015/04/24 10:16:27 [emerg] 5782#0: still could not bind()

Change History (14)

comment:1 by Maxim Dounin, 10 years ago

Status: newaccepted

The ngx_master_process_cycle() closes listening sockets by itself on SIGQUIT, without using the ngx_close_listening_sockets() function which is capable of correctly removing unix sockets after closing them. Looks like it needs to be changed to use ngx_close_listening_sockets() instead.

comment:2 by openid.yandex.ru/yermulnik, 10 years ago

Encountered the same trouble with Nginx 1.9.1 on Ubuntu 14.04 and FreeBSD 10.1-RELEASE.
Please, consider fixing this ASAP.
Thanx.

comment:3 by openid.yandex.ru/yermulnik, 10 years ago

Maxim Dounin, can you please let us know if this going to be implemented in a near future? thanx

comment:4 by maxim, 10 years ago

Hello,

why it is so urgent?

comment:5 by Maxim Dounin, 10 years ago

This problem should never affect any properly working nginx installation (there is no reason to stop nginx unless you are stopping the host itself), so it's unlikely to be looked into in a near future. Moreover, the fix is likely to affect other use cases (e.g., binary upgrade) and should be done with care, if at all.

comment:6 by openid.yandex.ru/yermulnik, 9 years ago

So if for any reason I would decide to stop, deinstall and later install and start nginx without server reboot using some automation software like SlatStack, Chef etc, when I get broken node (or plenty of them in the case of mass reconfigurations) I should remember that the reason is "there is no reason to stop nginx unless you are stopping the host itself"? %-\
Anyway thanx for the answer.

comment:7 by micah@…, 9 years ago

I can confirm that a SIGQUIT will not remove the socket, and a SIGTERM will.

However, I disagree with what mdounin says about this never affecting anyone. On a debian system, if you stop or restart nginx using the init script, you *do* run into this problem.

The initscript runs in its stop function:

start-stop-daemon --stop --quiet --retry=$STOP_SCHEDULE --pidfile $PID --name $NAME

The --stop argument will send a SIGTERM to all matching processes. However its possible to send a different signal with the --signal option, or the --retry option, and it seems like it is sending this option as $STOP_SCHEDULE which is by default set to 'QUIT/5' for systemd, and STOP_SCHEDULE="${STOP_SCHEDULE:-QUIT/5/TERM/5/KILL/5}" for the initscript.

in its restart function, it simply runs a stop() and then start().

However, a reload() will work ok for reloading the config.

This means that anyone doing a 'graceful shutdown' wont have their socket cleaned up. Most people would want to do a graceful shutdown before doing a 'fast' shutdown, no?

Last edited 9 years ago by micah@… (previous) (diff)

comment:8 by micah@…, 9 years ago

It also should be noted that using SIGQUIT for process termination is a bit weird for process terminating:

SIGQUIT       3       Core    Quit from keyboard
SIGTERM      15       Term    Termination signal

most applications don't bother intercepting SIGQUIT.

comment:9 by phaoost@…, 6 years ago

The defect still exists in version 1.14.1

comment:10 by axos88@…, 5 years ago

Still exists on 1.16.0.

This is also preventing me from restarting nginx via systemctl.

comment:11 by thibaultcha@…, 5 years ago

Ran into this issue myself. Here is a patch I just proposed to the nginx-devel mailing list that resolves it:

# HG changeset patch
# User Thibault Charbonnier <thibaultcha@me.com>
# Date 1582764433 28800
#      Wed Feb 26 16:47:13 2020 -0800
# Node ID 55ea1a9197a6f28d4da00909e5ea8585f6a08239
# Parent  4f18393a1d51bce6103ea2f1b2587900f349ba3d
Ensured SIGQUIT deletes listening UNIX socket files.

Prior to this patch, the SIGQUIT signal handling (graceful shutdown) did not
remove UNIX socket files since ngx_master_process_cycle reimplemented
listening
socket closings in lieu of using ngx_close_listening_sockets.

Since ngx_master_process_exit will call the aforementioned
ngx_close_listening_sockets, we can remove the custom implementation and now
expect listening sockets to be closed properly by
ngx_close_listening_sockets
instead.

This fixes the trac issue #753 (https://trac.nginx.org/nginx/ticket/753).

diff -r 4f18393a1d51 -r 55ea1a9197a6 src/os/unix/ngx_process_cycle.c
--- a/src/os/unix/ngx_process_cycle.c   Thu Feb 20 16:51:07 2020 +0300
+++ b/src/os/unix/ngx_process_cycle.c   Wed Feb 26 16:47:13 2020 -0800
@@ -77,12 +77,11 @@
     u_char            *p;
     size_t             size;
     ngx_int_t          i;
-    ngx_uint_t         n, sigio;
+    ngx_uint_t         sigio;
     sigset_t           set;
     struct itimerval   itv;
     ngx_uint_t         live;
     ngx_msec_t         delay;
-    ngx_listening_t   *ls;
     ngx_core_conf_t   *ccf;

     sigemptyset(&set);
@@ -205,16 +204,6 @@
             ngx_signal_worker_processes(cycle,

ngx_signal_value(NGX_SHUTDOWN_SIGNAL));

-            ls = cycle->listening.elts;
-            for (n = 0; n < cycle->listening.nelts; n++) {
-                if (ngx_close_socket(ls[n].fd) == -1) {
-                    ngx_log_error(NGX_LOG_EMERG, cycle->log,
ngx_socket_errno,
-                                  ngx_close_socket_n " %V failed",
-                                  &ls[n].addr_text);
-                }
-            }
-            cycle->listening.nelts = 0;
-
             continue;
         }

comment:13 by luisdavim@…, 5 years ago

This should be prioritised, it is an issue for example when running nginx in kubernetes.

https://github.com/nginxinc/docker-nginx/issues/167
https://github.com/nginxinc/docker-nginx/issues/377

comment:14 by davehope@…, 5 years ago

Just a further comment to try and raise the profile of this issue. It's one I've been tracking for some time and face on a regular basis.

Note: See TracTickets for help on using tickets.