Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#1022 closed defect (wontfix)

Nginx graceful shutdown doesn't inform clients to close "keep-alive" connections

Reported by: micheleorsi@… Owned by:
Priority: minor Milestone:
Component: other Version: 1.10.x
Keywords: graceful, shutdown, keep-alive, tcp, connection Cc:
uname -a: Darwin lm00281.local 15.4.0 Darwin Kernel Version 15.4.0: Fri Feb 26 22:08:05 PST 2016; root:xnu-3248.40.184~3/RELEASE_X86_64 x86_64
nginx -V: nginx version: nginx/1.10.1 built by clang 7.3.0 (clang-703.0.31) built with OpenSSL 1.0.2h 3 May 2016 TLS SNI support enabled configure arguments: --prefix=/usr/local/Cellar/nginx/1.10.1 --with-http_ssl_module --with-pcre --with-ipv6 --sbin-path=/usr/local/Cellar/nginx/1.10.1/bin/nginx --with-cc-opt='-I/usr/local/Cellar/pcre/8.38/include -I/usr/local/Cellar/openssl/1.0.2h_1/include' --with-ld-opt='-L/usr/local/Cellar/pcre/8.38/lib -L/usr/local/Cellar/openssl/1.0.2h_1/lib' --conf-path=/usr/local/etc/nginx/nginx.conf --pid-path=/usr/local/var/run/nginx.pid --lock-path=/usr/local/var/run/nginx.lock --http-client-body-temp-path=/usr/local/var/run/nginx/client_body_temp --http-proxy-temp-path=/usr/local/var/run/nginx/proxy_temp --http-fastcgi-temp-path=/usr/local/var/run/nginx/fastcgi_temp --http-uwsgi-temp-path=/usr/local/var/run/nginx/uwsgi_temp --http-scgi-temp-path=/usr/local/var/run/nginx/scgi_temp --http-log-path=/usr/local/var/log/nginx/access.log --error-log-path=/usr/local/var/log/nginx/error.log --with-http_gzip_static_module

Description

In case a client uses the keep-alive feature to make requests, during graceful shutdown it will receive some errors because it continues to re-use such connections. On the other hand when nginx is stopped those connections are not anymore valid and the clients will receive several errors.

I would expect that during graceful shutdown nginx inform the clients to close connection with specific HTTP header (Connection: close), so that the client can open brand new connections.

I suffer from this problem when I am in a configuration with F5 in front of several nginx. When I want to stop some of them and launch other instances, F5 stop sending new requests to the quitting nginx but clients with "keep-alive" connections continue to make requests against such instances. So that when nginx instances are stopped clients will receive drop connection errors because they were not informed about closing TCP connection on quitting nginx instances.

Here is a simple test to verify the behaviour:

nginx -c sample.conf

  • sample.conf is this:

events {}
http {

server {
listen 9999;

location / {

proxy_pass http://localhost:8081;

}

}

}

  • ab testing running:

ab -v 2 -n 1000 -c 25 -k http://127.0.0.1:9999/load

When I launch ..

nginx -s quit

.. I continue to receive

Connection: keep-alive

.. until the very end, before nginx quit.

Change History (7)

comment:1 Changed 3 years ago by mdounin

  • Resolution set to wontfix
  • Status changed from new to closed

During configuration reload existing connections are closed as soon as they are idle. This is in line with HTTP protocol and clients are expected to be prepared for this, citing RFC 2616:

A client, server, or proxy MAY close the transport connection at any
time. For example, a client might have started to send a new request
at the same time that the server has decided to close the "idle"
connection. From the server's point of view, the connection is being
closed while it was idle, but from the client's point of view, a
request is in progress.

This means that clients, servers, and proxies MUST be able to recover
from asynchronous close events. Client software SHOULD reopen the
transport connection and retransmit the aborted sequence of requests
without user interaction so long as the request sequence is
idempotent (see section 9.1.2). Non-idempotent methods or sequences
MUST NOT be automatically retried, although user agents MAY offer a
human operator the choice of retrying the request(s). Confirmation by
user-agent software with semantic understanding of the application
MAY substitute for user confirmation. The automatic retry SHOULD NOT
be repeated if the second sequence of requests fails.

While changing nginx logic to always wait for an additional request and return a response with "Connection: close" should be possible, it will complicate things, introduce an additional delay to configuration reloads, and not considered needed, as current behaviour is in line with HTTP specification, see above.

If you still think there are reasons to change the behaviour, please be more specific.

comment:2 Changed 2 years ago by mdounin

See also #1247.

comment:3 Changed 2 years ago by westse@…

While it is true that clients must expect close events, client software can only automatically retransmit idempotent requests. As noted in the RFC, non-idempotent requests cannot be automatically retried, so the close event represents an outage.

In RFC 2616 (same section cited previously) is guidance on how Nginx should behave to minimize disruption to clients:

"Servers will usually have some time-out value beyond which they will
no longer maintain an inactive connection. Proxy servers might make
this a higher value since it is likely that the client will be making
more connections through the same server. The use of persistent
connections places no requirements on the length (or existence) of
this time-out for either the client or the server.

When a client or server wishes to time-out it SHOULD issue a graceful
close on the transport connection
. Clients and servers SHOULD both
constantly watch for the other side of the transport close, and
respond to it as appropriate. If a client or server does not detect
the other side's close promptly it could cause unnecessary resource
drain on the network."

So we feel Nginx “SHOULD issue a graceful close on the transport connection” by sending Connection: close for the next request on the connection (unless no such request is forthcoming, in which case the connection should be closed upon a configurable timeout)

The concern about incurring additional delays for Nginx config reloads is valid. This could be mitigated by providing Nginx config properties to opt out (to the current behavior), as well as configuring the timeout for awaiting new requests on the same connection.

We are greatly impacted by this due to frequent config reloads in a Kubernetes environment. Our current workaround is to disable keepAlive connections, which is far from ideal. We are happy to help out with pull requests, etc, but would like agreement on direction before committing more effort.

Thanks for listening!

comment:4 Changed 2 years ago by westse@…

  • Resolution wontfix deleted
  • Status changed from closed to reopened

I'm unsure if comments on closed tickets are seen, so reopening to be safe. Please advise if this is incorrect process.

comment:5 Changed 2 years ago by mdounin

  • Resolution set to wontfix
  • Status changed from reopened to closed

Your interpretation of "graceful close on the transport connection" is certainly incorrect. There are no http requests / responses on the transport level, and it is not possible to close a connection by sending a response. RFC clearly talks about closing transport connections, and not about answering additional requests.

Note well that even in case of non-idempotent requests, "user agents MAY offer a human operator the choice of retrying the request".

comment:6 Changed 2 years ago by westse@…

You are right about the misinterpretation - "graceful" likely means something like TCP reset (?) - we were applying that at the HTTP level.

So Nginx is compliant with the RFC. Is there room to go beyond the RFC? Nginx is in a position to avoid disruption, but otherwise disruption seems unavoidable if some non-idempotent requests have to be kicked back to a human whenever config is reloaded.

We are happy to help by providing pull requests. Is this feature something you would entertain in the product?

comment:7 Changed 2 years ago by vbart

IMHO introducing timeout here doesn't solve the problem. Client can send additional request at any moment e.g. right before timeout occurs.

Note: See TracTickets for help on using tickets.