Opened 4 years ago
Closed 4 years ago
#1976 closed defect (invalid)
Nginx DNS cache issue. ngx_http_core_module valid config is not working.
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | major | Milestone: | |
Component: | nginx-core | Version: | 1.12.x |
Keywords: | resolver cache dns | Cc: | |
uname -a: | Linux ip-10-200-18-244.us-west-1.compute.internal 4.14.72-73.55.amzn2.x86_64 #1 SMP Thu Sep 27 23:37:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.12.2
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) built with OpenSSL 1.0.2k-fips 26 Jan 2017 TLS SNI support enabled configure arguments: --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --http-scgi-temp-path=/var/lib/nginx/tmp/scgi --pid-path=/run/nginx.pid --lock-path=/run/lock/subsys/nginx --user=nginx --group=nginx --with-file-aio --with-ipv6 --with-http_auth_request_module --with-http_ssl_module --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module=dynamic --with-http_image_filter_module=dynamic --with-http_geoip_module=dynamic --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_slice_module --with-http_stub_status_module --with-http_perl_module=dynamic --with-mail=dynamic --with-mail_ssl_module --with-pcre --with-pcre-jit --with-stream=dynamic --with-stream_ssl_module --with-google_perftools_module --with-debug --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' --with-ld-opt='-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-E' |
Description (last modified by )
Setup:
Ec2 instance with Nginx 1.12.2 behind an AWS Load balancer. The load balancer has an Idle timeout of 60 seconds. Upstream is again another Loadbalancer. The DNS TTL of the CNAME and A records are 60 seconds.
Problem:
AWS ELB IP changes periodically and thats how it works. When the IP changes, Nginx is not picking the new IP address, instead requests are sent to the cached IP (Old IP). This is resulting in 499 after the request_time reaches 59.xxx. 499 is due to ELB closing the request after the 60 seconds Idle time. The issue is Nginx caching the ELB IP and not updating it.
Debugging:
- I read the docs, checked the previously reported issues. As per my understanding, This issue should not occur. The resolver doc says,
By default, Nginx caches answers using the TTL value of a response.
My DNS TTL is 60 seconds. I have noticed the error for more than 5 mins, untill nginx is reloaded.Before version 1.1.9, tuning of caching time was not possible, and nginx always cached answers for the duration of 5 minutes.
.
- I added the resolver config to override the cache (if there is one).
server { listen 81; server_name my.host.name; resolver 10.200.0.2 valid=60s;
It's not working. Still noticing that the DNS is cached till reload.
- Updated the DNS from CNAME to Alias A record. The TTL is always 60 seconds. For CNAME and A records. Still issue exist.
Supporting docs/Logs:
This is a production machine and I have only the logs.
log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for" "$upstream_addr"' ' $request_time';
Before Nginx Reload:
10.200.201.145 - - 05/May/2020:19:51:39 +0000 "GET /api/monitor HTTP/1.1" 499 0 "-" "Pingdom.com_bot_version_1.4_http://www.pingdom.com/" "76.72.167.90" "10.200.83.167:80" 59.337
DNS report: Here we can see that the IPs ($upstream_addr) have changed.
# dig +short rails.randomname.us-west-1.infra
internal-a0330ec15812a11e9a1660649a282fad-316863378.us-west-1.elb.amazonaws.com.
10.200.94.115
10.200.94.221
After Nginx reload:
10.200.201.145 - - 05/May/2020:19:52:39 +0000 "GET /api/monitor HTTP/1.1" 200 49 "-" "Pingdom.com_bot_version_1.4_http://www.pingdom.com/" "185.180.12.65" "10.200.94.221:80" 0.003
The issue is always resolved after Nginx reload. Please advise me on how to fix this. Let me know if I can get more data for debugging.
Change History (1)
comment:1 by , 4 years ago
Description: | modified (diff) |
---|---|
Resolution: | → invalid |
Status: | new → closed |
Domain names used in nginx configuration are normally resolved during parsing of the configuration. Currently, the only exceptions are:
proxy_pass
,fastcgi_pass
, etc. directives contain variables.ssl_stapling
and resolving OCSP responder hostname.server ... resolve;
in anupstream
block (available as a part of the commercial subscription).All these cases explicitly document that they are using
resolver
. For example, quoting proxy_pass documentation:Don't expect that names written in the configuration will be re-resolved by nginx periodically. Depending on the particular use case, you may want to either reconfigure nginx as IP addresses change, or configure nginx in a way which implies period re-resolution of names you want to be re-resolved.
Note that re-resolution implies run-time overhead and also may end up with a non-working nginx if DNS server won't be reachable. On the other hand, nginx reconfiguration implies additional resource usage till reconfiguration complete, which may take a while. For this and other reasons automatic reconfiguration might not be a good idea.