Opened 4 years ago

Last modified 4 months ago

#753 accepted defect

Nginx leaves UNIX domain sockets after SIGQUIT

Reported by: launchpad.net/~cpburnz Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.6.x
Keywords: nginx 1.6 1.8 sigquit sigterm unix domain socket Cc:
uname -a: Linux test1 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.6.3 TLS SNI support enabled configure arguments: --with-cc-opt='-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_geoip_module --with-http_gzip_static_module --with-http_image_filter_module --with-http_spdy_module --with-http_sub_module --with-http_xslt_module --with-mail --with-mail_ssl_module --add-module=/build/buildd/nginx-1.6.3/debian/modules/nginx-auth-pam --add-module=/build/buildd/nginx-1.6.3/debian/modules/nginx-dav-ext-module --add-module=/build/buildd/nginx-1.6.3/debian/modules/nginx-echo --add-module=/build/buildd/nginx-1.6.3/debian/modules/nginx-upstream-fair --add-module=/build/buildd/nginx-1.6.3/debian/modules/ngx_http_substitutions_filter_module nginx version: nginx/1.8.0 built by gcc 4.8.2 (Ubuntu 4.8.2-19ubuntu1) built with OpenSSL 1.0.1f 6 Jan 2014 TLS SNI support enabled configure arguments: --with-http_ssl_module

Description

According to the Nginx documentation, SIGQUIT will cause a "graceful shutdown" while SIGTERM will cause a "fast shutdown". If you send SIGQUIT to Nginx, it will leave behind stale UNIX domain socket files that were created using the listen directive. If there are any stale UNIX domain socket files when Nginx starts up, it will fail to listen on the socket because it already exists. However if you use SIGTERM, the UNIX domain socket files will be properly removed. I've encountered this with Nginx 1.6.2, 1.6.3, and 1.8.0 on Ubuntu 14.04.

Example /etc/nginx/nginx.conf:

http {
    ##
    # Basic Settings
    ##

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    ##
    # Logging Settings
    ##

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ##
    # Gzip Settings
    ##

    gzip on;
    gzip_disable "msie6";

    ##
    # Virtual Host Configs
    ##

    include /etc/nginx/sites-enabled/*;
}

Example /etc/nginx/sites-enabled/serve-files:

server {
    listen unix:/run/serve-files.socket;
    root /var/www/files;
    location / {
        try_files $uri =404;
    }
}

Then start Nginx:

sudo nginx
# OR
sudo service nginx start

On first start, /run/serve-files.socket will be created because of the listen unix:/run/serve-files.socket; directive.

Then stop Nginx with SIGQUIT:

sudo kill -SIGQUIT $(cat /run/nginx.pid)
# OR
sudo service nginx stop # Sends SIGQUIT

The socket at /run/serve-files.socket will remain because it was not properly removed. If you try to restart Nginx, it will fail to start with the following logged to /var/log/nginx/error.log:

2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use)
2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use)
2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use)
2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use)
2015/04/24 10:16:27 [emerg] 5782#0: bind() to unix:/run/serve-files.socket failed (98: Address already in use)
2015/04/24 10:16:27 [emerg] 5782#0: still could not bind()

Change History (9)

comment:1 Changed 4 years ago by mdounin

  • Status changed from new to accepted

The ngx_master_process_cycle() closes listening sockets by itself on SIGQUIT, without using the ngx_close_listening_sockets() function which is capable of correctly removing unix sockets after closing them. Looks like it needs to be changed to use ngx_close_listening_sockets() instead.

comment:2 Changed 4 years ago by openid.yandex.ru/yermulnik

Encountered the same trouble with Nginx 1.9.1 on Ubuntu 14.04 and FreeBSD 10.1-RELEASE.
Please, consider fixing this ASAP.
Thanx.

comment:3 Changed 4 years ago by openid.yandex.ru/yermulnik

Maxim Dounin, can you please let us know if this going to be implemented in a near future? thanx

comment:4 Changed 4 years ago by maxim

Hello,

why it is so urgent?

comment:5 Changed 4 years ago by mdounin

This problem should never affect any properly working nginx installation (there is no reason to stop nginx unless you are stopping the host itself), so it's unlikely to be looked into in a near future. Moreover, the fix is likely to affect other use cases (e.g., binary upgrade) and should be done with care, if at all.

comment:6 Changed 4 years ago by openid.yandex.ru/yermulnik

So if for any reason I would decide to stop, deinstall and later install and start nginx without server reboot using some automation software like SlatStack?, Chef etc, when I get broken node (or plenty of them in the case of mass reconfigurations) I should remember that the reason is "there is no reason to stop nginx unless you are stopping the host itself"? %-\
Anyway thanx for the answer.

comment:7 Changed 4 years ago by micah@…

I can confirm that a SIGQUIT will not remove the socket, and a SIGTERM will.

However, I disagree with what mdounin says about this never affecting anyone. On a debian system, if you stop or restart nginx using the init script, you *do* run into this problem.

The initscript runs in its stop function:

start-stop-daemon --stop --quiet --retry=$STOP_SCHEDULE --pidfile $PID --name $NAME

The --stop argument will send a SIGTERM to all matching processes. However its possible to send a different signal with the --signal option, or the --retry option, and it seems like it is sending this option as $STOP_SCHEDULE which is by default set to 'QUIT/5' for systemd, and STOP_SCHEDULE="${STOP_SCHEDULE:-QUIT/5/TERM/5/KILL/5}" for the initscript.

in its restart function, it simply runs a stop() and then start().

However, a reload() will work ok for reloading the config.

This means that anyone doing a 'graceful shutdown' wont have their socket cleaned up. Most people would want to do a graceful shutdown before doing a 'fast' shutdown, no?

Last edited 4 years ago by micah@… (previous) (diff)

comment:8 Changed 4 years ago by micah@…

It also should be noted that using SIGQUIT for process termination is a bit weird for process terminating:

SIGQUIT       3       Core    Quit from keyboard
SIGTERM      15       Term    Termination signal

most applications don't bother intercepting SIGQUIT.

comment:9 Changed 4 months ago by phaoost@…

The defect still exists in version 1.14.1

Note: See TracTickets for help on using tickets.