#2557 closed defect (invalid)
Odd STALE reponse under long run stability test
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | nginx-core | Version: | 1.24.x |
Keywords: | stale cache | Cc: | p4rancesc0@… |
uname -a: | Linux nginx01.xx.yy 4.18.0-372.32.1.el8_6.x86_64 #1 SMP Thu Oct 27 15:18:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.24.0
built by gcc 8.5.0 20210514 (Red Hat 8.5.0-4) (GCC) built with OpenSSL 1.1.1k FIPS 25 Mar 2021 TLS SNI support enabled configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -pie' |
Description
Hi,
in a test enviroment the run some 12h+ tests to verify an odd behaviour we noticed in production.
A couple of scritps were asking the same file with a ramdom sleep beetween few seconds and 65 seconds.
Nginx is acting as reverse proxy and caching server.
The file from the real server is always coming from the the same real server, in the worst case the difference between Date and Last-Modified will be 6 seconds on the real server.
Nginx has STALE activated.
Nginx cache is configure to cache "any" for 1 second.
We are expecting on nginx a difference between Date and Last-Modified of about 7/8 seconds, maximum 9 seconds.
Are we are wrong ?
Here's the output of the scripts, combined.
Date
Cache Status
Difference between Date and Last-Modified
sleep for xxx milliseconds
-------- Fri Oct 27 00:28:26 CEST 2023 MISS 2 sleep for 81137 milliseconds -------- Fri Oct 27 00:28:56 CEST 2023 MISS 2 sleep for 60115 milliseconds -------- Fri Oct 27 00:29:47 CEST 2023 MISS 3 sleep for 60890 milliseconds -------- Fri Oct 27 00:29:57 CEST 2023 STALE 13 sleep for 60968 milliseconds -------- Fri Oct 27 00:30:48 CEST 2023 MISS 2 sleep for 26001 milliseconds -------- Fri Oct 27 00:30:58 CEST 2023 STALE 12 sleep for 61950 milliseconds -------- Fri Oct 27 00:31:14 CEST 2023 MISS 1 sleep for 58485 milliseconds -------- Fri Oct 27 00:32:00 CEST 2023 MISS 1 sleep for 59154 milliseconds -------- Fri Oct 27 00:32:13 CEST 2023 MISS 3 sleep for 27922 milliseconds
You can spot a couple of STALE 12 and 13 seconds old.
logs are attached...
here's the proxy cache configuration:
proxy_connect_timeout 2s; #proxy_connect_timeout 10s; proxy_send_timeout 2s; #proxy_send_timeout 10s; proxy_read_timeout 2s; #proxy_read_timeout 5s; proxy_buffering on; #TODO TO TEST proxy_cache_lock on; #CACHE LOCK proxy_cache_lock_timeout 100ms; #CACHE LOCK, old value 1s proxy_cache_lock_age 50ms; #CACHE LOCK, old value default = 5s proxy_cache_key $proxy_host$uri; proxy_cache_methods GET HEAD POST; proxy_cache_use_stale updating error timeout invalid_header http_500 http_502 http_503 http_504; # http_404; No 404 stale response proxy_cache_revalidate on; proxy_cache_background_update on; proxy_http_version 1.1; proxy_ignore_headers X-Accel-Expires Expires Cache-Control; proxy_hide_header Accept-Encoding; # TODO to be discussed proxy_next_upstream error timeout invalid_header http_403 http_404 http_502 http_503 http_504; proxy_set_header Accept-Encoding none; proxy_set_header Connection "keep-alive"; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_cache_path /data/nginx/manifest levels=1:1 keys_zone=cache-manifest:1m inactive=1s max_size=100m min_free=50m use_temp_path=off manager_files=10000 manager_threshold=1000ms manager_sleep=50ms loader_files=10000 loader_threshold=1000ms;
and here's the cache configuration in the correct location:
set $cache cache-manifest; proxy_pass http://usp-live-preprod; proxy_cache_valid any 1s; add_header X-Proxy-Cache $upstream_cache_status; add_header Access-Control-Allow-Origin *; # Can restrict this to specific origins. add_header 'Access-Control-Expose-Headers' 'Date,X-CDN' always; add_header 'Nginx-Cache-Name' $cache always;
Why are we experiencing STALE response instead of a MISS sometimes ?
Here's some reaults:
2 11
4 10
7 9
14 8
15 5
15 7
18 6
82 4
194 0
328 2
334 3
337 1
2 response where 11 seconds old
4 responses where 10 seconds old
7 responses where 9 seconds old
you can see that this is happening just sometimes...
BR
Francesco
Attachments (1)
Change History (4)
by , 11 months ago
comment:1 by , 11 months ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
You can spot a couple of STALE 12 and 13 seconds old.
The proxy_cache_use_stale updating;
combined with proxy_cache_background_update on;
in your configuration implies that any stale response present in nginx cache will be returned to the client, and nginx will try to update the cache item with a background subrequest, see docs for details.
This behaviour is consistent with what you are seeing:
-------- Fri Oct 27 00:29:47 CEST 2023 MISS 3 sleep for 60890 milliseconds -------- Fri Oct 27 00:29:57 CEST 2023 STALE 13 sleep for 60968 milliseconds --------
Note that previous response, MISS 3, caches the response. 10 seconds later response is still in the cache, and nginx returns it to the client as per the configuration, hence STALE 13.
If you don't want nginx to return stale responses to clients, consider changing your configuration accordingly.
follow-up: 3 comment:2 by , 11 months ago
Thanks Maxim, it makes absolulely sense.
We anyway some more questions...
1) how can the "MISS 3" be present in the cache after 10 seconds if the cache is configured with inactive=1s and proxy_cache_valid any 1s ?
2) why STALE is happening just (2+4+7) times in the tests while all the other (14+15+15+18+82+194+328+334+337) times it's just a MISS ?
It's not easy to reproduce, it just happens sometimes...
For instace:
- GET -> MISS
- wait 10seconds
- GET -> MISS
- wait 10seconds
can goes on for hours before the first STALE....
If you have the patience to give us a follow up it would be really great.
BR
Francesco
comment:3 by , 11 months ago
Replying to p4rancesc0@…:
1) how can the "MISS 3" be present in the cache after 10 seconds if the cache is configured with inactive=1s and proxy_cache_valid any 1s ?
The validity of the cache items, as configured with proxy_cache_valid
, does not affect presence of cache items in the cache: an expired (stale) cache item can be still present in the cache for a long time as long as it is not yet removed from the cache, and that's basically how proxy_cache_use_stale
works.
The inactive=
parameter of the proxy_cache_path
indeed does specify how cache items are removed from the cache, and assuming inactive=1s
is followed strictly the item shouldn't be in the cache after 10 seconds since the last request. The cache item might be removed at some time later though - for example, if the cache manager is busy doing other things. In particular, the cache manager process can sleep for up to 10 seconds if it has nothing to do - and probably this is what happens in your case.
2) why STALE is happening just (2+4+7) times in the tests while all the other (14+15+15+18+82+194+328+334+337) times it's just a MISS ?
Since you have inactive=1s
in your configuration, in most cases stale responses are removed from the cache before they are used.
acces, error and origin logs