Opened 2 years ago
Last modified 6 months ago
#2421 accepted enhancement
proxy_next_upstream_tries might be ignored with upstream keepalive
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | nginx-core | Version: | 1.23.x |
Keywords: | Cc: | ||
uname -a: | Linux static01-test 5.13.19-4-pve #1 SMP PVE 5.13.19-8 (Mon, 31 Jan 2022 10:09:37 +0100) x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.23.2 built by gcc 10.2.1 20210110 (Debian 10.2.1-6) built with OpenSSL 1.1.1n 15 Mar 2022 TLS SNI support enabled configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -ffile-prefix-map=/data/builder/debuild/nginx-1.23.2/debian/debuild-base/nginx-1.23.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie' |
Description
there is a bug with proxy_next_upstream_tries, which is ignored if Nginx is under load and the upstream server close the connection prematurely.
It seems to only occurs with a connection closed.
If too many requests are producing this error, It is possible to bring all the upstreams down for a certain time.
Here is a reproducer on docker, but the issue was noticed on debian9/11 as well with nginx 1.10<=1.23.0
Change History (7)
comment:1 by , 2 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 2 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
Yes, max_fails
AND fail_timeout
are used in my configuration. I give more details about the bug in the repository.
Of course it is possible that all upstreams are down, that is a expectable consequence, not the bug itself.
The real issue is that under load these configs are by passed.
therefore instead of seeing 2 errors messages (as set by proxy_next_upstream_tries
)
there will be one per upstream (so 6 in my example.)
Which can then naturally lead to an no live upstream error
.
comment:3 by , 2 years ago
Status: | reopened → accepted |
---|---|
Summary: | proxy_next_upstream_tries ignored under some circomstances → proxy_next_upstream_tries might be ignored with upstream keepalive |
Type: | defect → enhancement |
Sorry, I've misread what you are actually complaining about.
It looks like your configuration uses keepalive connections to upstream servers, and the error you are generating is the one non-distinguishable from asynchronous close events. As such, when a cached connection to an upstream server is used, nginx retries request regardless of the proxy_next_upstream_tries
till all servers are tried (or till a non-cached connection is used if there is only one upstream server). See ticket #2022 for details.
In recent HTTP standard editions requirements for retrying requests are relaxed compared to RFC 2616 referenced above, and it might make sense to relax nginx behaviour somehow, so it won't unconditionally retry such requests.
comment:4 by , 2 years ago
Thank you for your reply. So if I under correctly this is not a bug, it is a feature?
I'm still dont completely understand why this behavior would happen only under load ?
It will try the first cached connection for the first available upstream, then once its close why does it need to try all of them and by pass proxy_next_upstream_tries
?
comment:5 by , 2 years ago
RFC 2616 requires clients to retry requests if a requests used a cached connection, and it was closed by the server when the client was sending the request.
To make sure such requests are retried, nginx retries them unconditionally, ignoring the number of tries left (which is normally a number of servers in the upstream, but can be overridden by set to a lower value with proxy_next_upstream_tries
). Most notably, this makes it possible to retry requests to an upstream with just one server (normally, such requests are never retried). As a side effect, though, this makes proxy_next_upstream_tries
ineffective for such requests (since the number of tries is ignored, see above).
That is, as implemented, nginx will work as follows:
- It will select a server to use per request balancing method configured.
- If there are cached connections to this server, it'll use one.
- If a cached connection is closed while nginx was sending the request, nginx will retry the request, repeating the above steps.
For upstreams with just one server, this essentially means that nginx will retry till there are cached connections. For upstreams with multiple servers, nginx will keep track of servers it tried, and the first step will fail as long as all servers are tried.
Hope this better explains how it works now, and why the side effects observed do appear. It's an open question how to change this behaviour to make it less surprising.
comment:6 by , 2 years ago
ok thank you for explanation. But for our point of view, it is still suprising that even with max
set we can bring all upstream down.
comment:7 by , 6 months ago
Retries after asynchronous close events
need a limit.
I found a request with 9707 attempts to connect to upstream in my log, the truncated json log looks like:
{ "upstream_status": "502, 502, 502, <…>, 502, -", "upstream_response_time": "0.003, 0.001, 0.002, <…>, 0.002, 0.002", "request_time": 14.561, "status": 499, <…>
and errors look like
2024/06/07 11:47:25 [error] 52738#52738: *48268611 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 37.….….…, server: …, request: "GET …
I have a busy nginx server 1.21.6 (inside ingress-nginx 1.9.4), proxy_next_upstream_tries
is set to 2
and the poisoned request caused upstream to break connections while reading the request line
The proxy_next_upstream_tries directive limits the number of upstream servers tried by nginx. In your case, as per the
no live upstreams
error as seen in the logs, all upstream servers are considered unavailable, and therefore there are no servers to try. If you want nginx to ignore errors and don't consider servers to be unavailable after errors, consider tuning max_fails and fail_timeout.