Opened 8 years ago

Closed 7 years ago

Last modified 7 years ago

#1300 closed defect (fixed)

nginx configuration test is breaking connections from the running instance

Reported by: Max Laverse Owned by:
Priority: major Milestone: 1.13
Component: nginx-core Version: 1.13.x
Keywords: nginx reuseport reset test Cc:
uname -a: Linux doku 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u2 (2017-03-07) x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.13.1
built by gcc 4.9.2 (Debian 4.9.2-10)
configure arguments: --without-http_rewrite_module --without-http_gzip_module

Description

Hi!

We were investigating why we were losing requests when the Kubernetes Nginx Ingress was reloading its configuration. This Nginx Ingress is an Nginx server running in a container and acting as load-balancer and reverse-proxy for a Kubernetes cluster.

It appears that when reloading the configuration, this Ingress Controller component also starts a process that tests the configuration using nginx -t.
At that point we are observing new incoming connections being reset on the running instance.

When you run nginx -t, nginx tries to bind to the sockets and then calls listen.

On most of the default setups, if an instance of Nginx is already running the listen call will fail silently as the address is already in use.

The problems start when you use the reuseport option on the listen directives of our configuration.

http {
  server {
    listen 80 reuseport default_server;
[...]

In that case the listen call on the socket will succeed, even being in configuration test mode. Linux will start doing 3 way-handshakes for the incoming connections. Once the test is finished, all established but not accepted connections on this configuration test process will abruptly be closed.

Having Nginx calling listen on the sockets when testing the configuration seems wrong to us.

How to reproduce

To reproduce this issue, I compiled Nginx from the release-1.13.1 tag on a Debian server and then used following configuration:

user www-data;
worker_processes 1;
pid /run/nginx.pid;

events {
 worker_connections 768;
}

http {
 server {
   listen 80 reuseport default_server;
 }
}

Start a server instance.

Then with Apache Bench Version 2.3 <$Revision: 1604373 $>, run:

$ ab -c2 -t10 http://127.0.0.1:80/

Running this command dozens of time will always succeed.
Now if you run in another tab:

for i in `seq 1 10`; do objs/nginx -t -c /root/nginx.conf; done

You should see Apache Bench stopping all the time before the end of the test, complaining about reset connections:

apr_socket_recv: Connection reset by peer (104)

Additional traces

Running an strace -e bind,listen objs/nginx -t with the configuration given above:

nginx: the configuration file /root/nginx.conf syntax is ok
bind(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(6, 511)                          = 0
nginx: configuration file /root/nginx.conf test is successful
+++ exited with 0 +++

We clearly see that the configuration test process managed to bind and listen to that socket.

On the other side, with the reuseport directive removed:

nginx: the configuration file /root/nginx.conf syntax is ok
bind(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(6, 511)                          = -1 EADDRINUSE (Address already in use)
nginx: configuration file /root/nginx.conf test is successful
+++ exited with 0 ++

Here we see that the listen call fails and with Apache Bench we would also see no connection being reset.

Workaround

Since we are using the Nginx Ingress Controller in Kubernetes which always calls nginx it to check the configuration before reloading it, we came up with a workaround which is to preload a library that overrides the listen call when testing the configuration.

#include <stdio.h>

int listen(int sockfd, int backlog)
{
 printf("Would have called listen() on a socket\n");
 return 0;
}

Can be compiled using:

$ gcc -fPIC -shared  -o fakelisten.so fakelisten.c -ldl

And then run with:

$ LD_PRELOAD=./fakelisten.so nginx -t

Kubernetes Info

This issue was initially observed on a Kubernetes cluster, in a Docker container.
The parameter of the servers where this was originally observed are:

$ uname -a
Linux node-2 4.4.0-75-generic #96-Ubuntu SMP Thu Apr 20 09:56:33 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ nginx -V
nginx version: nginx/1.11.10
built by gcc 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 
built with OpenSSL 1.0.2g  1 Mar 2016
TLS SNI support enabled
configure arguments: --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_geoip_module --with-http_gzip_static_module --with-http_sub_module --with-http_v2_module --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-threads --with-file-aio --without-mail_pop3_module --without-mail_smtp_module --without-mail_imap_module --without-http_uwsgi_module --without-http_scgi_module --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' --add-module=/tmp/build/ngx_devel_kit-0.3.0 --add-module=/tmp/build/set-misc-nginx-module-0.31 --add-module=/tmp/build/nginx-module-vts-0.1.11 --add-module=/tmp/build/lua-nginx-module-0.10.7 --add-module=/tmp/build/headers-more-nginx-module-0.32 --add-module=/tmp/build/nginx-goodies-nginx-sticky-module-ng-08a395c66e42 --add-module=/tmp/build/nginx-http-auth-digest-7955af9c77598c697ac292811914ce1e2b3b824c --add-module=/tmp/build/ngx_http_substitutions_filter_module-bc58cb11844bc42735bbaef7085ea86ace46d05b --add-module=/tmp/build/lua-upstream-nginx-module-0.06

Change History (5)

comment:1 by Maxim Dounin, 7 years ago

Just to clarify, there are two factors that contribute to the observed behaviour:

  1. using SO_REUSEPORT allows creating multiple sockets on the same port;
  2. Linux aborts all connections in the socket when closing a socket with SO_REUSEPORT.

The (2) is also responsible to connection resets when the number of worker processes is reduced. And it is believed that it is not possible to completely resolve this without fixing the Linux kernel. Ideally, Linux should re-distribute connections after closing a socket with SO_REUSEPORT much like Dragonfly BSD does.

Probably the best solution would be to do not set SO_REUSEPORT when testing a configuration. Patch:

# HG changeset patch
# User Maxim Dounin <mdounin@mdounin.ru>
# Date 1499345186 -10800
#      Thu Jul 06 15:46:26 2017 +0300
# Node ID 1f845f4d607c2565fde2fe39e25ba0e90fd806d6
# Parent  70e65bf8dfd7a8d39aae8ac3a209d426e6947735
Core: disabled SO_REUSEPORT when testing config (ticket #1300).

When closing a socket with SO_REUSEPORT, Linux drops all connections waiting
in this socket's listen queue.  Previously, it was believed to only result
in connection resets when reconfiguring nginx to use smaller number of worker
processes.  It also results in connection resets during configuration
testing though.

Workaround is to avoid using SO_REUSEPORT when testing configuration.  It
should prevent listening sockets from being created if a conflicting socket
already exists, while still preserving detection of other possible errors.
It should also cover UDP sockets.

The only downside of this approach seems to be that a configuration testing
won't be able to properly report the case when nginx was compiled with
SO_REUSEPORT, but the kernel is not able to set it.  Such errors will be
reported on a real start instead.

diff --git a/src/core/ngx_connection.c b/src/core/ngx_connection.c
--- a/src/core/ngx_connection.c
+++ b/src/core/ngx_connection.c
@@ -473,7 +473,7 @@ ngx_open_listening_sockets(ngx_cycle_t *
 
 #if (NGX_HAVE_REUSEPORT)
 
-            if (ls[i].reuseport) {
+            if (ls[i].reuseport && !ngx_test_config) {
                 int  reuseport;
 
                 reuseport = 1;

Note well that forcibly doing nginx -t before configuration reloading might not be a good idea, see ticket #724.

comment:2 by Max Laverse, 7 years ago

We were suspecting that other OSs might handle such a case in a better way than Linux does.

Regarding your comment in the patch, if we really want to catch failing setsockopt calls with SO_REUSEPORT we could maybe just unset this option if we're doing a configuration test ? It makes the code a bit more complex but it would preserve the current behavior and it's executed only during configuration test.

Thanks for the link to #724 btw.

comment:3 by Maxim Dounin <mdounin@…>, 7 years ago

In 7064:ecb5cd305b06/nginx:

Core: disabled SO_REUSEPORT when testing config (ticket #1300).

When closing a socket with SO_REUSEPORT, Linux drops all connections waiting
in this socket's listen queue. Previously, it was believed to only result
in connection resets when reconfiguring nginx to use smaller number of worker
processes. It also results in connection resets during configuration
testing though.

Workaround is to avoid using SO_REUSEPORT when testing configuration. It
should prevent listening sockets from being created if a conflicting socket
already exists, while still preserving detection of other possible errors.
It should also cover UDP sockets.

The only downside of this approach seems to be that a configuration testing
won't be able to properly report the case when nginx was compiled with
SO_REUSEPORT, but the kernel is not able to set it. Such errors will be
reported on a real start instead.

comment:4 by Maxim Dounin, 7 years ago

Resolution: fixed
Status: newclosed

Patch committed, thanks for reporting this.

comment:5 by Maxim Dounin <mdounin@…>, 7 years ago

In 7138:05bd1baabf87/nginx:

Core: disabled SO_REUSEPORT when testing config (ticket #1300).

When closing a socket with SO_REUSEPORT, Linux drops all connections waiting
in this socket's listen queue. Previously, it was believed to only result
in connection resets when reconfiguring nginx to use smaller number of worker
processes. It also results in connection resets during configuration
testing though.

Workaround is to avoid using SO_REUSEPORT when testing configuration. It
should prevent listening sockets from being created if a conflicting socket
already exists, while still preserving detection of other possible errors.
It should also cover UDP sockets.

The only downside of this approach seems to be that a configuration testing
won't be able to properly report the case when nginx was compiled with
SO_REUSEPORT, but the kernel is not able to set it. Such errors will be
reported on a real start instead.

Note: See TracTickets for help on using tickets.