Opened 4 years ago
Last modified 4 years ago
#2026 closed defect
Excessive attempts to reconnect when upstream connection refused — at Version 2
Reported by: | Nuru | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | nginx-core | Version: | 1.19.x |
Keywords: | Cc: | ||
uname -a: | Linux ingress-nginx-ingress-controller-pp28p 4.14.181-142.260.amzn2.x86_64 #1 SMP Wed Jun 24 19:07:39 UTC 2020 x86_64 Linux | ||
nginx -V: |
nginx version: nginx/1.19.1
built by gcc 9.2.0 (Alpine 9.2.0) built with OpenSSL 1.1.1g 21 Apr 2020 TLS SNI support enabled configure arguments: --prefix=/usr/local/nginx --conf-path=/etc/nginx/nginx.conf --modules-path=/etc/nginx/modules --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-compat --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_geoip_module --with-http_gzip_static_module --with-http_sub_module --with-http_v2_module --with-stream --with-stream_ssl_module --with-stream_realip_module --with-stream_ssl_preread_module --with-threads --with-http_secure_link_module --with-http_gunzip_module --with-file-aio --without-mail_pop3_module --without-mail_smtp_module --without-mail_imap_module --without-http_uwsgi_module --without-http_scgi_module --with-cc-opt='-g -Og -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wno-deprecated-declarations -fno-strict-aliasing -D_FORTIFY_SOURCE=2 --param=ssp-buffer-size=4 -DTCP_FASTOPEN=23 -fPIC -Wno-cast-function-type -I/root/.hunter/_Base/2c5c6fc/d64af22/92161a9/Install/include -m64 -mtune=native' --with-ld-opt='-fPIE -fPIC -pie -Wl,-z,relro -Wl,-z,now -L/root/.hunter/_Base/2c5c6fc/d64af22/92161a9/Install/lib' --user=www-data --group=www-data --add-module=/tmp/build/ngx_devel_kit-0.3.1 --add-module=/tmp/build/set-misc-nginx-module-0.32 --add-module=/tmp/build/headers-more-nginx-module-0.33 --add-module=/tmp/build/nginx-http-auth-digest-cd8641886c873cf543255aeda20d23e4cd603d05 --add-module=/tmp/build/ngx_http_substitutions_filter_module-bc58cb11844bc42735bbaef7085ea86ace46d05b --add-module=/tmp/build/lua-nginx-module-0.10.17 --add-module=/tmp/build/stream-lua-nginx-module-0.0.8 --add-module=/tmp/build/lua-upstream-nginx-module-0.07 --add-module=/tmp/build/nginx-influxdb-module-5b09391cb7b9a889687c0aa67964c06a2d933e8b --add-dynamic-module=/tmp/build/nginx-opentracing-0.9.0/opentracing --add-dynamic-module=/tmp/build/ModSecurity-nginx-b55a5778c539529ae1aa10ca49413771d52bb62e --add-dynamic-module=/tmp/build/ngx_http_geoip2_module-3.3 --add-module=/tmp/build/nginx_ajp_module-bf6cd93f2098b59260de8d494f0f4b1f11a84627 --add-module=/tmp/build/ngx_brotli |
Description (last modified by )
Using the TCP load balancer, when the upstream refuses the connection, Nginx immediately retries without adequate throttling. I am seeing 4,000 retries per second with log entries like:
2020-08-16T06:21:03.662449141Z 2020/08/16 06:21:03 [error] 78#78: *15771 connect() failed (111: Connection refused) while connecting to upstream, client: 10.105.13.228, server: 0.0.0.0:26808, upstream: "10.105.0.30:26808", bytes from/to client:0/0, bytes from/to upstream:0/0 2020-08-16T06:21:03.662452948Z 2020/08/16 06:21:03 [error] 78#78: *15500 connect() failed (111: Connection refused) while connecting to upstream, client: 10.105.13.228, server: 0.0.0.0:26808, upstream: "10.105.0.30:26808", bytes from/to client:0/0, bytes from/to upstream:0/0
This causes a dramatic increase in CPU and Memory usage. I am not sure if the client is retrying that quickly (client is Amazon Web Services Network Load Balancer Health Check), but even if it is, Nginx should throttle upstream connection attempts according to the Passive TCP Health Checks documentation:
The default values are 10 seconds and 1 attempt. So if a connection attempt times out or fails at least once in a 10‑second period, NGINX marks the server as unavailable for 10 seconds
See also: https://github.com/kubernetes/ingress-nginx/issues/5425
Configuration excerpt (configuration created by ingress-nginx v0.34.1):
stream { ... upstream upstream_balancer { server 0.0.0.1:1234; # placeholder balancer_by_lua_block { tcp_udp_balancer.balance() } } server { preread_by_lua_block { ngx.var.proxy_upstream_name="tcp-example-26808"; } listen 26808; proxy_timeout 600s; proxy_pass upstream_balancer; } }
Change History (2)
comment:1 by , 4 years ago
comment:2 by , 4 years ago
Description: | modified (diff) |
---|
Please add me to CC list. Seems I cannot do that myself.