#1022 closed defect (wontfix)
Nginx graceful shutdown doesn't inform clients to close "keep-alive" connections
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | other | Version: | 1.10.x |
Keywords: | graceful, shutdown, keep-alive, tcp, connection | Cc: | |
uname -a: | Darwin lm00281.local 15.4.0 Darwin Kernel Version 15.4.0: Fri Feb 26 22:08:05 PST 2016; root:xnu-3248.40.184~3/RELEASE_X86_64 x86_64 | ||
nginx -V: |
nginx version: nginx/1.10.1
built by clang 7.3.0 (clang-703.0.31) built with OpenSSL 1.0.2h 3 May 2016 TLS SNI support enabled configure arguments: --prefix=/usr/local/Cellar/nginx/1.10.1 --with-http_ssl_module --with-pcre --with-ipv6 --sbin-path=/usr/local/Cellar/nginx/1.10.1/bin/nginx --with-cc-opt='-I/usr/local/Cellar/pcre/8.38/include -I/usr/local/Cellar/openssl/1.0.2h_1/include' --with-ld-opt='-L/usr/local/Cellar/pcre/8.38/lib -L/usr/local/Cellar/openssl/1.0.2h_1/lib' --conf-path=/usr/local/etc/nginx/nginx.conf --pid-path=/usr/local/var/run/nginx.pid --lock-path=/usr/local/var/run/nginx.lock --http-client-body-temp-path=/usr/local/var/run/nginx/client_body_temp --http-proxy-temp-path=/usr/local/var/run/nginx/proxy_temp --http-fastcgi-temp-path=/usr/local/var/run/nginx/fastcgi_temp --http-uwsgi-temp-path=/usr/local/var/run/nginx/uwsgi_temp --http-scgi-temp-path=/usr/local/var/run/nginx/scgi_temp --http-log-path=/usr/local/var/log/nginx/access.log --error-log-path=/usr/local/var/log/nginx/error.log --with-http_gzip_static_module |
Description
In case a client uses the keep-alive feature to make requests, during graceful shutdown it will receive some errors because it continues to re-use such connections. On the other hand when nginx is stopped those connections are not anymore valid and the clients will receive several errors.
I would expect that during graceful shutdown nginx inform the clients to close connection with specific HTTP header (Connection: close), so that the client can open brand new connections.
I suffer from this problem when I am in a configuration with F5 in front of several nginx. When I want to stop some of them and launch other instances, F5 stop sending new requests to the quitting nginx but clients with "keep-alive" connections continue to make requests against such instances. So that when nginx instances are stopped clients will receive drop connection errors because they were not informed about closing TCP connection on quitting nginx instances.
Here is a simple test to verify the behaviour:
- a simple service that serves traffic at http://localhost:8081/load
- nginx (1.10.1) runs in this way:
nginx -c sample.conf
- sample.conf is this:
events {}
http {
server {
listen 9999;
location / {
proxy_pass http://localhost:8081;
}
}
}
- ab testing running:
ab -v 2 -n 1000 -c 25 -k http://127.0.0.1:9999/load
When I launch ..
nginx -s quit
.. I continue to receive
Connection: keep-alive
.. until the very end, before nginx quit.
Change History (7)
comment:1 by , 8 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:3 by , 8 years ago
While it is true that clients must expect close events, client software can only automatically retransmit idempotent requests. As noted in the RFC, non-idempotent requests cannot be automatically retried, so the close event represents an outage.
In RFC 2616 (same section cited previously) is guidance on how Nginx should behave to minimize disruption to clients:
"Servers will usually have some time-out value beyond which they will
no longer maintain an inactive connection. Proxy servers might make
this a higher value since it is likely that the client will be making
more connections through the same server. The use of persistent
connections places no requirements on the length (or existence) of
this time-out for either the client or the server.
When a client or server wishes to time-out it SHOULD issue a graceful
close on the transport connection. Clients and servers SHOULD both
constantly watch for the other side of the transport close, and
respond to it as appropriate. If a client or server does not detect
the other side's close promptly it could cause unnecessary resource
drain on the network."
So we feel Nginx “SHOULD issue a graceful close on the transport connection” by sending Connection: close
for the next request on the connection (unless no such request is forthcoming, in which case the connection should be closed upon a configurable timeout)
The concern about incurring additional delays for Nginx config reloads is valid. This could be mitigated by providing Nginx config properties to opt out (to the current behavior), as well as configuring the timeout for awaiting new requests on the same connection.
We are greatly impacted by this due to frequent config reloads in a Kubernetes environment. Our current workaround is to disable keepAlive connections, which is far from ideal. We are happy to help out with pull requests, etc, but would like agreement on direction before committing more effort.
Thanks for listening!
comment:4 by , 8 years ago
Resolution: | wontfix |
---|---|
Status: | closed → reopened |
I'm unsure if comments on closed tickets are seen, so reopening to be safe. Please advise if this is incorrect process.
comment:5 by , 8 years ago
Resolution: | → wontfix |
---|---|
Status: | reopened → closed |
Your interpretation of "graceful close on the transport connection" is certainly incorrect. There are no http requests / responses on the transport level, and it is not possible to close a connection by sending a response. RFC clearly talks about closing transport connections, and not about answering additional requests.
Note well that even in case of non-idempotent requests, "user agents MAY offer a human operator the choice of retrying the request".
comment:6 by , 8 years ago
You are right about the misinterpretation - "graceful" likely means something like TCP reset (?) - we were applying that at the HTTP level.
So Nginx is compliant with the RFC. Is there room to go beyond the RFC? Nginx is in a position to avoid disruption, but otherwise disruption seems unavoidable if some non-idempotent requests have to be kicked back to a human whenever config is reloaded.
We are happy to help by providing pull requests. Is this feature something you would entertain in the product?
comment:7 by , 8 years ago
IMHO introducing timeout here doesn't solve the problem. Client can send additional request at any moment e.g. right before timeout occurs.
During configuration reload existing connections are closed as soon as they are idle. This is in line with HTTP protocol and clients are expected to be prepared for this, citing RFC 2616:
While changing nginx logic to always wait for an additional request and return a response with "Connection: close" should be possible, it will complicate things, introduce an additional delay to configuration reloads, and not considered needed, as current behaviour is in line with HTTP specification, see above.
If you still think there are reasons to change the behaviour, please be more specific.