Opened 4 years ago

Last modified 4 years ago

#1976 closed defect

Nginx DNS cache issue. ngx_http_core_module valid config is not working. — at Initial Version

Reported by: aswin020@… Owned by:
Priority: major Milestone:
Component: nginx-core Version: 1.12.x
Keywords: resolver cache dns Cc:
uname -a: Linux ip-10-200-18-244.us-west-1.compute.internal 4.14.72-73.55.amzn2.x86_64 #1 SMP Thu Sep 27 23:37:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.12.2
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)
built with OpenSSL 1.0.2k-fips 26 Jan 2017
TLS SNI support enabled
configure arguments: --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --http-scgi-temp-path=/var/lib/nginx/tmp/scgi --pid-path=/run/nginx.pid --lock-path=/run/lock/subsys/nginx --user=nginx --group=nginx --with-file-aio --with-ipv6 --with-http_auth_request_module --with-http_ssl_module --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module=dynamic --with-http_image_filter_module=dynamic --with-http_geoip_module=dynamic --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_slice_module --with-http_stub_status_module --with-http_perl_module=dynamic --with-mail=dynamic --with-mail_ssl_module --with-pcre --with-pcre-jit --with-stream=dynamic --with-stream_ssl_module --with-google_perftools_module --with-debug --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' --with-ld-opt='-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-E'

Description

Setup:

Ec2 instance with Nginx 1.12.2 behind an AWS Load balancer. The load balancer has an Idle timeout of 60 seconds. Upstream is again another Loadbalancer. The DNS TTL of the CNAME and A records are 60 seconds.

Problem:

AWS ELB IP changes periodically and thats how it works. When the IP changes, Nginx is not picking the new IP address, instead requests are sent to the cached IP (Old IP). This is resulting in 499 after the request_time reaches 59.xxx. 499 is due to ELB closing the request after the 60 seconds Idle time. The issue is Nginx caching the ELB IP and not updating it.

Debugging:

  1. I read the docs, checked the previously reported issues. As per my understanding, This issue should not occur. The resolver doc says, By default, Nginx caches answers using the TTL value of a response. My DNS TTL is 60 seconds. I have noticed the error for more than 5 mins, untill nginx is reloaded. Before version 1.1.9, tuning of caching time was not possible, and nginx always cached answers for the duration of 5 minutes..
  1. I added the resolver config to override the cache (if there is one).
server {
    listen       81;
    server_name  my.host.name;

    resolver 10.200.0.2 valid=60s;

It's not working. Still noticing that the DNS is cached till reload.

  1. Updated the DNS from CNAME to Alias A record. The TTL is always 60 seconds. For CNAME and A records. Still issue exist.

Supporting docs/Logs:

This is a production machine and I have only the logs.

	   log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" "$upstream_addr"'
			' $request_time';

Before Nginx Reload:

10.200.201.145 - - 05/May/2020:19:51:39 +0000 "GET /api/monitor HTTP/1.1" 499 0 "-" "Pingdom.com_bot_version_1.4_http://www.pingdom.com/" "76.72.167.90" "10.200.83.167:80" 59.337

DNS report: Here we can see that the IPs ($upstream_addr) have changed.

# dig +short rails.randomname.us-west-1.infra
internal-a0330ec15812a11e9a1660649a282fad-316863378.us-west-1.elb.amazonaws.com.
10.200.94.115
10.200.94.221

After Nginx reload:

10.200.201.145 - - 05/May/2020:19:52:39 +0000 "GET /api/monitor HTTP/1.1" 200 49 "-" "Pingdom.com_bot_version_1.4_http://www.pingdom.com/" "185.180.12.65" "10.200.94.221:80" 0.003

The issue is always resolved after Nginx reload. Please advise me on how to fix this. Let me know if I can get more data for debugging.

Change History (0)

Note: See TracTickets for help on using tickets.