#1300 closed defect (fixed)
nginx configuration test is breaking connections from the running instance
Reported by: | Max Laverse | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 1.13 |
Component: | nginx-core | Version: | 1.13.x |
Keywords: | nginx reuseport reset test | Cc: | |
uname -a: | Linux doku 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u2 (2017-03-07) x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.13.1
built by gcc 4.9.2 (Debian 4.9.2-10) configure arguments: --without-http_rewrite_module --without-http_gzip_module |
Description
Hi!
We were investigating why we were losing requests when the Kubernetes Nginx Ingress was reloading its configuration. This Nginx Ingress is an Nginx server running in a container and acting as load-balancer and reverse-proxy for a Kubernetes cluster.
It appears that when reloading the configuration, this Ingress Controller component also starts a process that tests the configuration using nginx -t
.
At that point we are observing new incoming connections being reset on the running instance.
When you run nginx -t
, nginx tries to bind
to the sockets and then calls listen
.
On most of the default setups, if an instance of Nginx is already running the listen
call will fail silently as the address is already in use.
The problems start when you use the reuseport
option on the listen
directives of our configuration.
http { server { listen 80 reuseport default_server; [...]
In that case the listen
call on the socket will succeed, even being in configuration test mode. Linux will start doing 3 way-handshakes for the incoming connections. Once the test is finished, all established but not accepted connections on this configuration test process will abruptly be closed.
Having Nginx calling listen
on the sockets when testing the configuration seems wrong to us.
How to reproduce
To reproduce this issue, I compiled Nginx from the release-1.13.1 tag on a Debian server and then used following configuration:
user www-data; worker_processes 1; pid /run/nginx.pid; events { worker_connections 768; } http { server { listen 80 reuseport default_server; } }
Start a server instance.
Then with Apache Bench Version 2.3 <$Revision: 1604373 $>
, run:
$ ab -c2 -t10 http://127.0.0.1:80/
Running this command dozens of time will always succeed.
Now if you run in another tab:
for i in `seq 1 10`; do objs/nginx -t -c /root/nginx.conf; done
You should see Apache Bench stopping all the time before the end of the test, complaining about reset connections:
apr_socket_recv: Connection reset by peer (104)
Additional traces
Running an strace -e bind,listen objs/nginx -t
with the configuration given above:
nginx: the configuration file /root/nginx.conf syntax is ok bind(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 listen(6, 511) = 0 nginx: configuration file /root/nginx.conf test is successful +++ exited with 0 +++
We clearly see that the configuration test process managed to bind
and listen
to that socket.
On the other side, with the reuseport directive removed:
nginx: the configuration file /root/nginx.conf syntax is ok bind(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 listen(6, 511) = -1 EADDRINUSE (Address already in use) nginx: configuration file /root/nginx.conf test is successful +++ exited with 0 ++
Here we see that the listen
call fails and with Apache Bench we would also see no connection being reset.
Workaround
Since we are using the Nginx Ingress Controller in Kubernetes which always calls nginx it
to check the configuration before reloading it, we came up with a workaround which is to preload a library that overrides the listen call when testing the configuration.
#include <stdio.h> int listen(int sockfd, int backlog) { printf("Would have called listen() on a socket\n"); return 0; }
Can be compiled using:
$ gcc -fPIC -shared -o fakelisten.so fakelisten.c -ldl
And then run with:
$ LD_PRELOAD=./fakelisten.so nginx -t
Kubernetes Info
This issue was initially observed on a Kubernetes cluster, in a Docker container.
The parameter of the servers where this was originally observed are:
$ uname -a Linux node-2 4.4.0-75-generic #96-Ubuntu SMP Thu Apr 20 09:56:33 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux $ nginx -V nginx version: nginx/1.11.10 built by gcc 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) built with OpenSSL 1.0.2g 1 Mar 2016 TLS SNI support enabled configure arguments: --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_geoip_module --with-http_gzip_static_module --with-http_sub_module --with-http_v2_module --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-threads --with-file-aio --without-mail_pop3_module --without-mail_smtp_module --without-mail_imap_module --without-http_uwsgi_module --without-http_scgi_module --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' --add-module=/tmp/build/ngx_devel_kit-0.3.0 --add-module=/tmp/build/set-misc-nginx-module-0.31 --add-module=/tmp/build/nginx-module-vts-0.1.11 --add-module=/tmp/build/lua-nginx-module-0.10.7 --add-module=/tmp/build/headers-more-nginx-module-0.32 --add-module=/tmp/build/nginx-goodies-nginx-sticky-module-ng-08a395c66e42 --add-module=/tmp/build/nginx-http-auth-digest-7955af9c77598c697ac292811914ce1e2b3b824c --add-module=/tmp/build/ngx_http_substitutions_filter_module-bc58cb11844bc42735bbaef7085ea86ace46d05b --add-module=/tmp/build/lua-upstream-nginx-module-0.06
Change History (5)
comment:1 by , 7 years ago
comment:2 by , 7 years ago
We were suspecting that other OSs might handle such a case in a better way than Linux does.
Regarding your comment in the patch, if we really want to catch failing setsockopt
calls with SO_REUSEPORT
we could maybe just unset this option if we're doing a configuration test ? It makes the code a bit more complex but it would preserve the current behavior and it's executed only during configuration test.
Thanks for the link to #724 btw.
comment:4 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Patch committed, thanks for reporting this.
Just to clarify, there are two factors that contribute to the observed behaviour:
The (2) is also responsible to connection resets when the number of worker processes is reduced. And it is believed that it is not possible to completely resolve this without fixing the Linux kernel. Ideally, Linux should re-distribute connections after closing a socket with SO_REUSEPORT much like Dragonfly BSD does.
Probably the best solution would be to do not set SO_REUSEPORT when testing a configuration. Patch:
Note well that forcibly doing
nginx -t
before configuration reloading might not be a good idea, see ticket #724.