Opened 6 months ago

Closed 5 months ago

#2163 closed defect (invalid)

nginx suddenly stop accepting request & returns 499

Reported by: vijaysridhar03@… Owned by:
Priority: blocker Milestone:
Component: nginx-core Version: 1.16.x
Keywords: Cc:
uname -a: Linux vijay.com 3.10.0-1160.15.2.el7.x86_64 #1 SMP Wed Feb 3 15:06:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.16.1
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
built with OpenSSL 1.1.1c FIPS 28 May 2019 (running with OpenSSL 1.1.1g FIPS 21 Apr 2020)
TLS SNI support enabled
configure arguments: --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --http-scgi-temp-path=/var/lib/nginx/tmp/scgi --pid-path=/run/nginx.pid --lock-path=/run/lock/subsys/nginx --user=nginx --group=nginx --with-file-aio --with-ipv6 --with-http_ssl_module --with-http_v2_module --with-http_realip_module --with-stream_ssl_preread_module --with-http_addition_module --with-http_xslt_module=dynamic --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_slice_module --with-http_stub_status_module --with-http_perl_module=dynamic --with-http_auth_request_module --with-mail=dynamic --with-mail_ssl_module --with-pcre --with-pcre-jit --with-stream=dynamic --with-stream_ssl_module --with-google_perftools_module --with-debug --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' --with-ld-opt='-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-E'

Description

Hi ,

we have nginx in our production , where we have configured with below Server block . nginx is working fine for 1day , after that it slowly start returning 499 error code for some request & later all request were returning 499 .
restart of nginx fix the issue for some time . we have other server block ( all other http based proxy_pass) in this same nginx which all are running fine .

Also as part of OS upgrade nginx was upgraded from nginx-1:1.16.1-1 to nginx-1:1.16.1-3 , not sure if this is causing the issue

added resolver as part of troubleshooting as we are using domain name in the proxypass . mysite.api.prod.vijay.com is the cloudfare where

server {

listen 9876;
server_name 192.168.100.123;


keepalive_timeout 10;

location / {
resolver 4.4.4.4;

proxy_ssl_server_name on;
proxy_pass https://mysite.api.prod.vijay.com/speed/send;

}

}

Attachments (2)

nginx-T.txt (10.4 KB ) - added by vijaysridhar03@… 5 months ago.
nginx T ouput
debug_499_error_log.txt (15.1 KB ) - added by vijaysridhar03@… 5 months ago.

Download all attachments as: .zip

Change History (11)

comment:1 by Maxim Dounin, 6 months ago

The 499 error means that the client closed the connection. Most likely it means that the upstream server took too long to respond, so clients started to give up waiting. If you think that this is nginx issue, you may want to provide more details.

Note well that domain names as specified in the configuration are resolved during configuration parsing, and adding the resolver directive does not change this. If the upstream name you are using started to resolve to different addresses, you have to reload nginx to re-resolve name. Alternatively, you can use variables in the proxy_pass directive to force nginx to resolve the name at runtime.

comment:2 by vijaysridhar03@…, 5 months ago

when the issue happens we were not able to get response even for curl http://192.168.100.123:9876 from the nginx server itself.

after restarting the nginx is responding to curl & other API calls .

let know if any other details is required

comment:3 by Maxim Dounin, 5 months ago

First of all, you may want to:

  • Define "not able to get response". What exactly curl prints?
  • Check if nginx is properly responding on alternative server blocks, which are not proxied to the particular upstream server.
  • Check the error log (and make sure it is configured at least at the "warn" level).

Please also provide full configuration as shown by nginx -T.

by vijaysridhar03@…, 5 months ago

Attachment: nginx-T.txt added

nginx T ouput

comment:4 by vijaysridhar03@…, 5 months ago

attached the nginx -T output

All other Server Blocks are working fine . only mentioned server block is having issue .

error is configured with warn & no information about the 499 request in the error log

Define "not able to get response". What exactly curl prints?

executed curl from nginx server as part of troubleshooting

when nginx is responding curl http://192.168.100.123:9876 ( going to ProxyPass & get fetching desriesd output )

when we receive multiple 499 request that time when we execute curl http://192.168.100.123:9876 command hangs & no response .

comment:5 by Maxim Dounin, 5 months ago

All other Server Blocks are working fine . only mentioned server block is having issue

So, nginx is working fine, only requests proxied to a particular upstream server are affected. As already pointed out in comment:1, this again indicate that the problem is with particular upstream server, not nginx.

error is configured with warn & no information about the 499 request in the error log

Requests closed with status 499 are only logged at the "info" level. You have to look for other errors, such as timeouts when talking to the upstream server in question.

command hangs & no response

How long you've tried to wait for the response? What curl prints if you'll wait for at least 5 minutes?

comment:6 by vijaysridhar03@…, 5 months ago

During issue time manually we tested & we were able to get response from https://mysite.api.prod.vijay.com/speed/send ( this URL is hosted in AWS via cloudfront ) but no response not via ​http://192.168.100.123:9876 [this url which configured in nginx]]

once we restart the nginx , immediately we were able to get response from the nginx URL http://192.168.100.123:9876 .

unanswered question for us is that how nginx restart fix the issue , because as per the developers the application which is sending request to nginx & proxy_pass host configured in nginx both are not restarted.

Also as part of OS upgrade nginx was upgraded from nginx-1:1.16.1-1 to nginx-1:1.16.1-3 , not sure if this is causing the issue . because same configuration was working fine earlier .

How long you've tried to wait for the response? What curl prints if you'll wait for at least 5 minutes?

No response , it just hangs till we press ctrl +c .from OS level 9876 Port is in Listen State

comment:7 by Maxim Dounin, 5 months ago

once we restart the nginx , immediately we were able to get response from the nginx URL

This behaviour suggests that the IP addresses the domain name resolves to has been changed, and the old one, which is still used by nginx, no longer responds. As already explained in comment:1, your configuration does not expect that IP addresses the name resolves to can change, and requires nginx configuration reload to pick up new IP addresses.

How long you've tried to wait for the response? What curl prints if you'll wait for at least 5 minutes?

No response , it just hangs till we press ctrl +c

You haven't answered at least one of the questions here. How long you've tried to wait for the response? If in doubt, consider providing time curl ... output, including the time output.

It is also a good idea to look for errors in the error log while doing this. Given your configuration does not contain custom proxy_connect_timeout and proxy_read_timeout, it is expected to respond or at least log an error after the default timeout expires, that is, after 60 seconds. If the name resolves to multiple IP addresses, nginx will try these addresses in order, and this may take a while, though you should see errors logged every 60 seconds or so.

comment:8 by vijaysridhar03@…, 5 months ago

currently we are restarting the nginx every day to avoid continuous 499 error.

while troubleshooting I did not take the time output of the curl command . since we are restarting daily i am not able to get this error

in the error log we are not able to see any details when we receive 499 error.

so we have enabled debug for error log & got this . ( attached the debug part for 499 error )

*64929 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while reading response header from upstream, client: 192.168.200.9, server: 192.168.100.123, request: "POST / HTTP/1.1", upstream: "https://192.6.136.123:443/speed/send", host: "192.168.100.123:9876"

now in UAT environment have changed the ​variables in the proxy_pass directive for nginx to resolve the name at runtime. trying to simulate by giving load in UAT & checking if the error is recurring

by vijaysridhar03@…, 5 months ago

Attachment: debug_499_error_log.txt added

comment:9 by Maxim Dounin, 5 months ago

Resolution: invalid
Status: newclosed

Thank you for the details. So, basically, we don't have information how long you've been waiting for the response, and highly likely this was less than 60 seconds, not to mention 5 minutes I've suggested above.

As previously explained, this looks like a configuration error. No information provided suggests this is a bug in nginx, so closing this.

Note well that if you expect nginx to respond in seconds, you may want to tune at least proxy_connect_timeout to better match your expectations, and probably proxy_read_timeout and proxy_send_timeout as well. It might be also a good idea to tune proxy_next_upstream_tries if you are proxying to names which resolve to multiple IP addresses.

Note: See TracTickets for help on using tickets.