Opened 5 years ago
Last modified 5 years ago
#1976 closed defect
Nginx DNS cache issue. ngx_http_core_module valid config is not working. — at Initial Version
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | major | Milestone: | |
Component: | nginx-core | Version: | 1.12.x |
Keywords: | resolver cache dns | Cc: | |
uname -a: | Linux ip-10-200-18-244.us-west-1.compute.internal 4.14.72-73.55.amzn2.x86_64 #1 SMP Thu Sep 27 23:37:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.12.2
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) built with OpenSSL 1.0.2k-fips 26 Jan 2017 TLS SNI support enabled configure arguments: --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --http-scgi-temp-path=/var/lib/nginx/tmp/scgi --pid-path=/run/nginx.pid --lock-path=/run/lock/subsys/nginx --user=nginx --group=nginx --with-file-aio --with-ipv6 --with-http_auth_request_module --with-http_ssl_module --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module=dynamic --with-http_image_filter_module=dynamic --with-http_geoip_module=dynamic --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_slice_module --with-http_stub_status_module --with-http_perl_module=dynamic --with-mail=dynamic --with-mail_ssl_module --with-pcre --with-pcre-jit --with-stream=dynamic --with-stream_ssl_module --with-google_perftools_module --with-debug --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' --with-ld-opt='-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-E' |
Description
Setup:
Ec2 instance with Nginx 1.12.2 behind an AWS Load balancer. The load balancer has an Idle timeout of 60 seconds. Upstream is again another Loadbalancer. The DNS TTL of the CNAME and A records are 60 seconds.
Problem:
AWS ELB IP changes periodically and thats how it works. When the IP changes, Nginx is not picking the new IP address, instead requests are sent to the cached IP (Old IP). This is resulting in 499 after the request_time reaches 59.xxx. 499 is due to ELB closing the request after the 60 seconds Idle time. The issue is Nginx caching the ELB IP and not updating it.
Debugging:
- I read the docs, checked the previously reported issues. As per my understanding, This issue should not occur. The resolver doc says,
By default, Nginx caches answers using the TTL value of a response.
My DNS TTL is 60 seconds. I have noticed the error for more than 5 mins, untill nginx is reloaded.Before version 1.1.9, tuning of caching time was not possible, and nginx always cached answers for the duration of 5 minutes.
.
- I added the resolver config to override the cache (if there is one).
server { listen 81; server_name my.host.name; resolver 10.200.0.2 valid=60s;
It's not working. Still noticing that the DNS is cached till reload.
- Updated the DNS from CNAME to Alias A record. The TTL is always 60 seconds. For CNAME and A records. Still issue exist.
Supporting docs/Logs:
This is a production machine and I have only the logs.
log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for" "$upstream_addr"' ' $request_time';
Before Nginx Reload:
10.200.201.145 - - 05/May/2020:19:51:39 +0000 "GET /api/monitor HTTP/1.1" 499 0 "-" "Pingdom.com_bot_version_1.4_http://www.pingdom.com/" "76.72.167.90" "10.200.83.167:80" 59.337
DNS report: Here we can see that the IPs ($upstream_addr) have changed.
# dig +short rails.randomname.us-west-1.infra
internal-a0330ec15812a11e9a1660649a282fad-316863378.us-west-1.elb.amazonaws.com.
10.200.94.115
10.200.94.221
After Nginx reload:
10.200.201.145 - - 05/May/2020:19:52:39 +0000 "GET /api/monitor HTTP/1.1" 200 49 "-" "Pingdom.com_bot_version_1.4_http://www.pingdom.com/" "185.180.12.65" "10.200.94.221:80" 0.003
The issue is always resolved after Nginx reload. Please advise me on how to fix this. Let me know if I can get more data for debugging.