Opened 10 months ago

Last modified 10 months ago

#1614 new defect

proxy cache 404 STALE forever

Reported by: d.jarosch.syseleven.de@… Owned by:
Priority: minor Milestone:
Component: other Version: 1.15.x
Keywords: proxy_cache stale proxy_cache_valid proxy_cache_use_stale Cc:
uname -a: Linux lb.info 2.6.32-042stab127.2 #1 SMP Thu Jan 4 16:41:44 MSK 2018 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.15.2 built by gcc 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9) built with OpenSSL 1.0.2g 1 Mar 2016 TLS SNI support enabled configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'

Description

Hello guys,

I'm having a hard time to understand if I'm having a bug in my configuration or if I hit a bug in NGINX.

My configuration so far:

proxy_cache_valid             200 15s;
proxy_cache_valid             301 15s;
proxy_cache_valid             404 15s;
proxy_cache                   stage_proxy-cache;
proxy_cache_lock              on;
proxy_cache_use_stale         updating error timeout http_404 http_500 http_502 http_503 http_504;
proxy_cache_background_update on;
proxy_cache_revalidate        on;

Our application, a Magnolia CMS, is defining the Cache-Control header via max-age and therfor controling how long an asset has to be kept in cache before it becomes STALE.

Now I'm experiencing a "strange" behavior which I don't understand. The behavior is as following:

  • I'm curling for an object like a HTML 3 times:
→ curl -I https://invaliddomain-com.stage.invaliddomain.info/en/products/formwork/concrete-maturity-computer.html
HTTP/2 200
server: nginx
date: Mon, 20 Aug 2018 09:41:29 GMT
content-type: text/html;charset=UTF-8
content-length: 72937
x-magnolia-registration: Registered
access-control-allow-origin: http://tools.live.invaliddomain.info
access-control-allow-methods: GET, OPTIONS, HEAD
access-control-allow-headers: X-PINGOTHER, Origin, X-Requested-With, Content-Type, Accept
cache-control: max-age=60, public
expires: Mon, 20 Aug 2018 09:42:29 GMT
last-modified: Mon, 20 Aug 2018 09:37:28 GMT
x-upstream: 10.6.198.15:8080
content-security-policy: default-src 'self' http: https:;img-src http: https: data:;font-src http: https: data:;script-src 'unsafe-inline' 'unsafe-eval' 'self' http: https:;style-src 'unsafe-inline' http: https:; frame-ancestors http://*.stage.invaliddomain.info https://*.stage.invaliddomain.info http://*.live.invaliddomain.info https://*.live.invaliddomain.info
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
x-cache: MISS

→ curl -I https://invaliddomain-com.stage.invaliddomain.info/en/products/formwork/concrete-maturity-computer.html
HTTP/2 200
server: nginx
date: Mon, 20 Aug 2018 09:41:32 GMT
content-type: text/html;charset=UTF-8
content-length: 72937
x-magnolia-registration: Registered
access-control-allow-origin: http://tools.live.invaliddomain.info
access-control-allow-methods: GET, OPTIONS, HEAD
access-control-allow-headers: X-PINGOTHER, Origin, X-Requested-With, Content-Type, Accept
cache-control: max-age=60, public
expires: Mon, 20 Aug 2018 09:42:29 GMT
last-modified: Mon, 20 Aug 2018 09:37:28 GMT
content-security-policy: default-src 'self' http: https:;img-src http: https: data:;font-src http: https: data:;script-src 'unsafe-inline' 'unsafe-eval' 'self' http: https:;style-src 'unsafe-inline' http: https:; frame-ancestors http://*.stage.invaliddomain.info https://*.stage.invaliddomain.info http://*.live.invaliddomain.info https://*.live.invaliddomain.info
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
x-cache: HIT

... after more then 60 seconds (cache-control: max-age)

HTTP/1.1 200
Server: nginx
Date: Mon, 20 Aug 2018 09:45:07 GMT
Content-Type: text/html;charset=UTF-8
Content-Length: 72937
Connection: keep-alive
Keep-Alive: timeout=5
X-Magnolia-Registration: Registered
Access-Control-Allow-Origin: http://tools.live.invaliddomain.info
Access-Control-Allow-Methods: GET, OPTIONS, HEAD
Access-Control-Allow-Headers: X-PINGOTHER, Origin, X-Requested-With, Content-Type, Accept
Cache-Control: max-age=60, public
Expires: Mon, 20 Aug 2018 09:44:47 GMT
Last-Modified: Mon, 20 Aug 2018 09:37:28 GMT
Content-Security-Policy: default-src 'self' http: https:;img-src http: https: data:;font-src http: https: data:;script-src 'unsafe-inline' 'unsafe-eval' 'self' http: https:;style-src 'unsafe-inline' http: https:; frame-ancestors http://*.stage.invaliddomain.info https://*.stage.invaliddomain.info http://*.live.invaliddomain.info https://*.live.invaliddomain.info
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
X-CACHE: STALE

Since the object hasn't been in the cache yet, first I receive a MISS, with the second request within 60 seconds I receive a X-CACHE: HIT and after more then 60 seconds of not requesting the object I receive a STALE and with the next curl again a HIT. Fine for me, everything works as expected for existing objects.

  • Now I'm doing this for a page which actually doesn't exists like following:
curl -I https://invaliddomain-com.stage.invaliddomain.info/en/testpage.html
HTTP/1.1 404
Server: nginx
Date: Mon, 20 Aug 2018 09:56:18 GMT
Content-Type: text/html;charset=UTF-8
Connection: keep-alive
Keep-Alive: timeout=5
X-Magnolia-Registration: Registered
Access-Control-Allow-Origin: http://tools.live.invaliddomain.info
Access-Control-Allow-Methods: GET, OPTIONS, HEAD
Access-Control-Allow-Headers: X-PINGOTHER, Origin, X-Requested-With, Content-Type, Accept
Cache-Control: max-age=60
Expires: Mon, 20 Aug 2018 09:57:05 GMT

First thing I realize is that there is no X-CACHE status comming back, but according to the NGINX access log I've "hit" somthing:

[20/Aug/2018:11:58:53 +0200] Cache: MISS 10.6.198.15:8080 0.573 404 458 37.44.6.6 invaliddomain-com.stage.invaliddomain.info /en/testpage.html
[20/Aug/2018:11:58:54 +0200] Cache: HIT - - 404 458 37.44.6.6 invaliddomain-com.stage.invaliddomain.info /en/testpage.html

According to the Cache-Control header I should receive a STALE after 60 seconds and I do:

[20/Aug/2018:11:59:55 +0200] Cache: STALE - - 404 458 37.44.6.6 invaliddomain-com.stage.invaliddomain.info /en/testpage.html

But I'm not leaving this status anymore until I purge the NGINX proxy cache:

...
[20/Aug/2018:12:00:13 +0200] Cache: STALE - - 404 458 37.44.6.6 invaliddomain-com.stage.invaliddomain.info /en/testpage.html
[20/Aug/2018:12:00:14 +0200] Cache: STALE - - 404 458 37.44.6.6 invaliddomain-com.stage.invaliddomain.info /en/testpage.html
[20/Aug/2018:12:00:15 +0200] Cache: STALE - - 404 458 37.44.6.6 invaliddomain-com.stage.invaliddomain.info /en/testpage.html
[20/Aug/2018:12:00:16 +0200] Cache: STALE - - 404 458 37.44.6.6 invaliddomain-com.stage.invaliddomain.info /en/testpage.html
...

Now I'm having a tricky situation. Let's say I'm removing an object from my backend by disabling it (404) for remake or whatever (new CSS etc.). The object is not available and I'm receiving a STALE 404 after while from cache. After finished remake of my object I'm enabling it again to be available, but I'm still receiving an 404 STALE object from cache, and therefor my new object won't be available until I purge the proxy cache.

My questions are:

  • Do I have a configuration error?
  • Do I misunderstand the proxy cache concept?
  • Is this behavior expected?
  • Are there any solutions for my problem?

Change History (11)

comment:1 Changed 10 months ago by d.jarosch.syseleven.de@…

Last edited 10 months ago by d.jarosch.syseleven.de@… (previous) (diff)

comment:2 follow-up: Changed 10 months ago by arut

Try disabling proxy_cache_background_update. Does it work well in this case?

comment:3 in reply to: ↑ 2 Changed 10 months ago by d.jarosch.syseleven.de@…

Replying to arut:

Try disabling proxy_cache_background_update. Does it work well in this case?

Tried, no luck. It's still the same behavior.

Cheers,
David

comment:4 follow-up: Changed 10 months ago by arut

OK. Let's go deeper then.

First thing I realize is that there is no X-CACHE status comming back

You didn't provide the complete config, but I assume you use the add_header X-CACHE $upstream_cache_status without the always parameter and this is why this header is not added for 404 response.

but according to the NGINX access log I've "hit" somthing:

You hit the 404 page cached by the previous request.

But I'm not leaving this status anymore until I purge the NGINX proxy cache:

If file is still missing, the response is totally normal considering proxy_cache_use_stale http_404, which prevents the cached 404 response from being updated if the new response is still 404.

After finished remake of my object I'm enabling it again to be available, but I'm still receiving an 404 STALE object from cache, and therefor my new object won't be available until I purge the proxy cache.

Is this what you actually see or is this a speculation you made from the previous observation?

If you keep receiving 404 after the new file is already there, could you somehow post the requests sent by nginx and the backend server response?

comment:5 in reply to: ↑ 4 Changed 10 months ago by d.jarosch.syseleven.de@…

Replying to arut:

First thing I realize is that there is no X-CACHE status comming back

You didn't provide the complete config, but I assume you use the add_header X-CACHE $upstream_cache_status without the always parameter and this is why this header is not added for 404 response.

This is what I've configured:

add_header            X-CACHE $upstream_cache_status;

I don't know what you mean by "always"? Can you give a configuration example? I don't see any parameters for that on https://nginx.org/en/docs.

But I'm not leaving this status anymore until I purge the NGINX proxy cache:

If file is still missing, the response is totally normal considering proxy_cache_use_stale http_404, which prevents the cached 404 response from being updated if the new response is still 404.

After finished remake of my object I'm enabling it again to be available, but I'm still receiving an 404 STALE object from cache, and therefor my new object won't be available until I purge the proxy cache.

Is this what you actually see or is this a speculation you made from the previous observation?

This is what I see. I'm able to enable and disable websites in the backend. Disabling and enabling a website with identical URL like https://invaliddomain-com.stage.invaliddomain.info/en/testpage.html gives me still a STALE version of the 404

If you keep receiving 404 after the new file is already there, could you somehow post the requests sent by nginx and the backend server response?

Unfortunately there is no server response from the backend, which makes sense when the object is cached, 4th column of the NGINX log format (-).

curl -I https://invaliddomain-com.stage.invaliddomain.info/en/testpage.html
[22/Aug/2018:08:35:57 +0200] Cache: STALE - - 404 135 37.44.x.x invaliddomain-com.stage.invaliddomain.info/en/testpage.html

NGINX log format:

log_format proxy '[$time_local] Cache: $upstream_cache_status $upstream_addr $upstream_response_time $status $bytes_sent $proxy_add_x_forwarded_for $host $request_uri';

But when I purge the cache I get following results:

Loadbalancer NGINX access logs:

[22/Aug/2018:08:45:52 +0200] Cache: MISS 10.6.198.15:8080 0.508 404 458 37.44.x.x invaliddomain-com.stage.invaliddomain.info /en/testpage.html
[22/Aug/2018:08:45:54 +0200] Cache: HIT - - 404 458 37.44.x.x invaliddomain-com.stage.invaliddomain.info /en/testpage.html

curl reponse:

curl -I https://invaliddomain-com.stage.invaliddomain.info/en/testpage.html

HTTP/1.1 404
Server: nginx
Date: Wed, 22 Aug 2018 06:45:54 GMT
Content-Type: text/html;charset=UTF-8
Connection: keep-alive
Keep-Alive: timeout=5
X-Magnolia-Registration: Registered
Access-Control-Allow-Origin: http://tools.live.peri.info
Access-Control-Allow-Methods: GET, OPTIONS, HEAD
Access-Control-Allow-Headers: X-PINGOTHER, Origin, X-Requested-With, Content-Type, Accept
Cache-Control: max-age=60
Expires: Wed, 22 Aug 2018 06:46:51 GMT

HTTP/1.1 404
Server: nginx
Date: Wed, 22 Aug 2018 06:45:56 GMT
Content-Type: text/html;charset=UTF-8
Connection: keep-alive
Keep-Alive: timeout=5
X-Magnolia-Registration: Registered
Access-Control-Allow-Origin: http://tools.live.peri.info
Access-Control-Allow-Methods: GET, OPTIONS, HEAD
Access-Control-Allow-Headers: X-PINGOTHER, Origin, X-Requested-With, Content-Type, Accept
Cache-Control: max-age=60
Expires: Wed, 22 Aug 2018 06:46:51 GMT

Tomcat access log:

10.6.198.13 37.44.x.x http-nio-8080-exec-10 [22/Aug/2018:08:45:52 +0200] "GET /en/testpage.html HTTP/1.1" 404 42726 0.507 https://invaliddomain-com.stage.invaliddomain.info/en/testpage.html

comment:6 follow-up: Changed 10 months ago by arut

I don't know what you mean by "always"? Can you give a configuration example?

add_header            X-CACHE $upstream_cache_status always;

I don't see any parameters for that on ​https://nginx.org/en/docs.

http://nginx.org/en/docs/http/ngx_http_headers_module.html#add_header

Since you have background cache update enabled, it's normal that you receive the stale response. It's just not normal that it's not updated after that. Maybe for some reason the background update takes too long.

Is there any chance that you provide a debug log? Since you said that the issue is not related to proxy_cache_background_update, it's better to turn it off for simplicity.

comment:7 in reply to: ↑ 6 Changed 10 months ago by d.jarosch.syseleven.de@…

Replying to arut:

I don't know what you mean by "always"? Can you give a configuration example?

add_header            X-CACHE $upstream_cache_status always;

Thank you! I've been looking for upstream_cache_status and not add_header -.-

I don't see any parameters for that on ​https://nginx.org/en/docs.

http://nginx.org/en/docs/http/ngx_http_headers_module.html#add_header

Since you have background cache update enabled, it's normal that you receive the stale response. It's just not normal that it's not updated after that. Maybe for some reason the background update takes too long.

Is there any chance that you provide a debug log? Since you said that the issue is not related to proxy_cache_background_update, it's better to turn it off for simplicity.

Request:

curl -I https://invaliddomain-de.stage.invaliddomain.info/test2.html

Response:

HTTP/1.1 404
Server: nginx
Date: Fri, 24 Aug 2018 14:22:36 GMT
Content-Type: text/html;charset=UTF-8
Connection: keep-alive
Keep-Alive: timeout=5
X-Magnolia-Registration: Registered
Cache-Control: max-age=60
Expires: Fri, 24 Aug 2018 14:23:35 GMT
X-CACHE: MISS

HTTP/1.1 404
Server: nginx
Date: Fri, 24 Aug 2018 14:22:46 GMT
Content-Type: text/html;charset=UTF-8
Connection: keep-alive
Keep-Alive: timeout=5
X-Magnolia-Registration: Registered
Cache-Control: max-age=60
Expires: Fri, 24 Aug 2018 14:23:35 GMT
X-CACHE: HIT

HTTP/1.1 404
Server: nginx
Date: Fri, 24 Aug 2018 14:23:37 GMT
Content-Type: text/html;charset=UTF-8
Connection: keep-alive
Keep-Alive: timeout=5
X-Magnolia-Registration: Registered
Cache-Control: max-age=60
Expires: Fri, 24 Aug 2018 14:23:35 GMT
X-CACHE: STALE

NGINX loadbalancer access and error debug logs:

==> /var/log/nginx/stage.invaliddomain.info.access_log <==
[24/Aug/2018:16:22:36 +0200] Cache: MISS 10.6.198.15:8080 0.609 404 274 37.44.x.x invaliddomain-de.stage.invaliddomain.info /test2.html

==> /var/log/nginx/stage.invaliddomain.info.error_log <==
2018/08/24 16:22:36 [info] 10882#10882: *159234 client 37.44.x.x closed keepalive connection

==> /var/log/nginx/stage.invaliddomain.info.access_log <==
[24/Aug/2018:16:22:46 +0200] Cache: HIT - - 404 273 37.44.x.x invaliddomain-de.stage.invaliddomain.info /test2.html

==> /var/log/nginx/stage.invaliddomain.info.access_log <==
[24/Aug/2018:16:23:37 +0200] Cache: STALE 10.6.198.15:8080 0.607 404 275 37.44.x.x invaliddomain-de.stage.invaliddomain.info /test2.html

==> /var/log/nginx/stage.invaliddomain.info.error_log <==
2018/08/24 16:23:37 [info] 10882#10882: *159656 client 37.44.x.x closed keepalive connection

comment:8 follow-up: Changed 10 months ago by arut

This is not a debug log. It should be much more verbose. You either compiled nginx without --with-debug or didn't set debug error level in error_log directive.

comment:9 in reply to: ↑ 8 Changed 10 months ago by d.jarosch.syseleven.de@…

Replying to arut:

This is not a debug log. It should be much more verbose. You either compiled nginx without --with-debug or didn't set debug error level in error_log directive.

You're absolutely right. Please check following pastebin: https://nopaste.xyz/?28c1b37bcce1a32d#AONgxF5lgKOvgArnXAsq1d0rFOGLjZCAe3A2HJpzaQk=

comment:10 follow-up: Changed 10 months ago by arut

The log says this:

  • 12:09:41 - client connects
    • sees STALE response in cache
    • schedules a background cache update subrequest (proxy_cache_background_update on)
    • returns stale 404 response from cache
  • 12:09:41 - background cache update subrequest
    • sees STALE response in cache
    • connects to the upstream server
    • sends the request
    • in 3 seconds receives 404 (!) response from the upstream server
    • because of proxy_cache_use_stale http_404 ignores this response and keeps the old stale 404 response in cache
  • 12:09:43 (while the upstream server is processing the request) - another client connects
    • sees STALE response in cache, but knows the response is being updated
    • returns stale response because of proxy_cache_use_stale updating

This all looks perfectly normal considering the fact that the upstream server for whatever reason returns 404 when trying to update cache.

comment:11 in reply to: ↑ 10 Changed 10 months ago by d.jarosch.syseleven.de@…

Replying to arut:

The log says this:

  • 12:09:41 - client connects
    • sees STALE response in cache
    • schedules a background cache update subrequest (proxy_cache_background_update on)
    • returns stale 404 response from cache
  • 12:09:41 - background cache update subrequest
    • sees STALE response in cache
    • connects to the upstream server
    • sends the request
    • in 3 seconds receives 404 (!) response from the upstream server
    • because of proxy_cache_use_stale http_404 ignores this response and keeps the old stale 404 response in cache
  • 12:09:43 (while the upstream server is processing the request) - another client connects
    • sees STALE response in cache, but knows the response is being updated
    • returns stale response because of proxy_cache_use_stale updating

This all looks perfectly normal considering the fact that the upstream server for whatever reason returns 404 when trying to update cache.

Thanks for the analysis. I'll need some days to double check everything.

Note: See TracTickets for help on using tickets.