Opened 4 years ago
Closed 4 years ago
#2163 closed defect (invalid)
nginx suddenly stop accepting request & returns 499
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | blocker | Milestone: | |
Component: | nginx-core | Version: | 1.16.x |
Keywords: | Cc: | ||
uname -a: | Linux vijay.com 3.10.0-1160.15.2.el7.x86_64 #1 SMP Wed Feb 3 15:06:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.16.1
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) built with OpenSSL 1.1.1c FIPS 28 May 2019 (running with OpenSSL 1.1.1g FIPS 21 Apr 2020) TLS SNI support enabled configure arguments: --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --http-scgi-temp-path=/var/lib/nginx/tmp/scgi --pid-path=/run/nginx.pid --lock-path=/run/lock/subsys/nginx --user=nginx --group=nginx --with-file-aio --with-ipv6 --with-http_ssl_module --with-http_v2_module --with-http_realip_module --with-stream_ssl_preread_module --with-http_addition_module --with-http_xslt_module=dynamic --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_slice_module --with-http_stub_status_module --with-http_perl_module=dynamic --with-http_auth_request_module --with-mail=dynamic --with-mail_ssl_module --with-pcre --with-pcre-jit --with-stream=dynamic --with-stream_ssl_module --with-google_perftools_module --with-debug --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' --with-ld-opt='-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-E' |
Description
Hi ,
we have nginx in our production , where we have configured with below Server block . nginx is working fine for 1day , after that it slowly start returning 499 error code for some request & later all request were returning 499 .
restart of nginx fix the issue for some time . we have other server block ( all other http based proxy_pass) in this same nginx which all are running fine .
Also as part of OS upgrade nginx was upgraded from nginx-1:1.16.1-1 to nginx-1:1.16.1-3 , not sure if this is causing the issue
added resolver as part of troubleshooting as we are using domain name in the proxypass . mysite.api.prod.vijay.com is the cloudfare where
server {
listen 9876;
server_name 192.168.100.123;
keepalive_timeout 10;
location / {
resolver 4.4.4.4;
proxy_ssl_server_name on;
proxy_pass https://mysite.api.prod.vijay.com/speed/send;
}
}
Attachments (2)
Change History (11)
comment:1 by , 4 years ago
comment:2 by , 4 years ago
when the issue happens we were not able to get response even for curl http://192.168.100.123:9876 from the nginx server itself.
after restarting the nginx is responding to curl & other API calls .
let know if any other details is required
comment:3 by , 4 years ago
First of all, you may want to:
- Define "not able to get response". What exactly
curl
prints? - Check if nginx is properly responding on alternative server blocks, which are not proxied to the particular upstream server.
- Check the error log (and make sure it is configured at least at the "warn" level).
Please also provide full configuration as shown by nginx -T
.
comment:4 by , 4 years ago
attached the nginx -T output
All other Server Blocks are working fine . only mentioned server block is having issue .
error is configured with warn & no information about the 499 request in the error log
Define "not able to get response". What exactly curl prints?
executed curl from nginx server as part of troubleshooting
when nginx is responding curl http://192.168.100.123:9876 ( going to ProxyPass & get fetching desriesd output )
when we receive multiple 499 request that time when we execute curl http://192.168.100.123:9876 command hangs & no response .
comment:5 by , 4 years ago
All other Server Blocks are working fine . only mentioned server block is having issue
So, nginx is working fine, only requests proxied to a particular upstream server are affected. As already pointed out in comment:1, this again indicate that the problem is with particular upstream server, not nginx.
error is configured with warn & no information about the 499 request in the error log
Requests closed with status 499 are only logged at the "info" level. You have to look for other errors, such as timeouts when talking to the upstream server in question.
command hangs & no response
How long you've tried to wait for the response? What curl prints if you'll wait for at least 5 minutes?
comment:6 by , 4 years ago
During issue time manually we tested & we were able to get response from https://mysite.api.prod.vijay.com/speed/send ( this URL is hosted in AWS via cloudfront ) but no response not via http://192.168.100.123:9876 [this url which configured in nginx]]
once we restart the nginx , immediately we were able to get response from the nginx URL http://192.168.100.123:9876 .
unanswered question for us is that how nginx restart fix the issue , because as per the developers the application which is sending request to nginx & proxy_pass host configured in nginx both are not restarted.
Also as part of OS upgrade nginx was upgraded from nginx-1:1.16.1-1 to nginx-1:1.16.1-3 , not sure if this is causing the issue . because same configuration was working fine earlier .
How long you've tried to wait for the response? What curl prints if you'll wait for at least 5 minutes?
No response , it just hangs till we press ctrl +c .from OS level 9876 Port is in Listen State
comment:7 by , 4 years ago
once we restart the nginx , immediately we were able to get response from the nginx URL
This behaviour suggests that the IP addresses the domain name resolves to has been changed, and the old one, which is still used by nginx, no longer responds. As already explained in comment:1, your configuration does not expect that IP addresses the name resolves to can change, and requires nginx configuration reload to pick up new IP addresses.
How long you've tried to wait for the response? What curl prints if you'll wait for at least 5 minutes?
No response , it just hangs till we press ctrl +c
You haven't answered at least one of the questions here. How long you've tried to wait for the response? If in doubt, consider providing time curl ...
output, including the time
output.
It is also a good idea to look for errors in the error log while doing this. Given your configuration does not contain custom proxy_connect_timeout
and proxy_read_timeout
, it is expected to respond or at least log an error after the default timeout expires, that is, after 60 seconds. If the name resolves to multiple IP addresses, nginx will try these addresses in order, and this may take a while, though you should see errors logged every 60 seconds or so.
comment:8 by , 4 years ago
currently we are restarting the nginx every day to avoid continuous 499 error.
while troubleshooting I did not take the time output of the curl command . since we are restarting daily i am not able to get this error
in the error log we are not able to see any details when we receive 499 error.
so we have enabled debug for error log & got this . ( attached the debug part for 499 error )
*64929 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while reading response header from upstream, client: 192.168.200.9, server: 192.168.100.123, request: "POST / HTTP/1.1", upstream: "https://192.6.136.123:443/speed/send", host: "192.168.100.123:9876"
now in UAT environment have changed the variables in the proxy_pass directive for nginx to resolve the name at runtime. trying to simulate by giving load in UAT & checking if the error is recurring
by , 4 years ago
Attachment: | debug_499_error_log.txt added |
---|
comment:9 by , 4 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Thank you for the details. So, basically, we don't have information how long you've been waiting for the response, and highly likely this was less than 60 seconds, not to mention 5 minutes I've suggested above.
As previously explained, this looks like a configuration error. No information provided suggests this is a bug in nginx, so closing this.
Note well that if you expect nginx to respond in seconds, you may want to tune at least proxy_connect_timeout to better match your expectations, and probably proxy_read_timeout and proxy_send_timeout as well. It might be also a good idea to tune proxy_next_upstream_tries if you are proxying to names which resolve to multiple IP addresses.
The 499 error means that the client closed the connection. Most likely it means that the upstream server took too long to respond, so clients started to give up waiting. If you think that this is nginx issue, you may want to provide more details.
Note well that domain names as specified in the configuration are resolved during configuration parsing, and adding the
resolver
directive does not change this. If the upstream name you are using started to resolve to different addresses, you have to reload nginx to re-resolve name. Alternatively, you can use variables in the proxy_pass directive to force nginx to resolve the name at runtime.