Opened 12 days ago

Closed 9 days ago

#2153 closed defect (invalid)

Race in proxy_cache_background_update with proxy_cache_use_stale = updating

Reported by: koshikov@… Owned by:
Priority: minor Milestone:
Component: documentation Version: 1.14.x
Keywords: proxy_cache_background_update Cc:
uname -a: Linux devvm4787 5.2.9-229_ #1 SMP Mon Aug 10 09:34:07 PDT 2020 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.14.1
built by gcc 8.2.1 20180905 (Red Hat 8.2.1-3) (GCC)
built with OpenSSL 1.1.1 FIPS 11 Sep 2018 (running with OpenSSL 1.1.1g FIPS 21 Apr 2020)
TLS SNI support enabled
configure arguments: --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --http-scgi-temp-path=/var/lib/nginx/tmp/scgi --pid-path=/run/nginx.pid --lock-path=/run/lock/subsys/nginx --user=nginx --group=nginx --with-file-aio --with-ipv6 --with-http_ssl_module --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module=dynamic --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_slice_module --with-http_stub_status_module --with-http_perl_module=dynamic --with-http_auth_request_module --with-mail=dynamic --with-mail_ssl_module --with-pcre --with-pcre-jit --with-stream=dynamic --with-stream_ssl_module --with-debug --with-cc-opt='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-E'

Description

Hello nginx developers,

the main issue - nginx server stale content from cache forever
setup

http_host:

proxy_to_main_backend ->

in_case_50x_error - error_page_to_fallback ->

fallback:

proxy_to_fallback_backend

Attached nginx config could repro issue with nginx serving stale cache "forever", even update process in the background is allowed. The issue present when all servers from 'main_backend' return error( in this case 502), but fallback servers are good and could serve requests.

Current behavior observed with debug logs:

when proxy_cache_use_stale includes 'updating' nginx replies to client with stale cache, as it should. Then it started connection to "backend_main", that returns 502 -> then fallbacks (second request) to "backend_fallback" and got reply. And that reply doesn't update cache.
Next request from the client would repeat above process and got stale reply + 2 connections without cache update.

when proxy_cache_use_stale doesn't have 'updating'. Nginx on client request starts 1 req to "backend_main" - got 502, then fallback (2 req) - got reply - then replies to client and save response to disk cache.
Next request from the client would be server from disk cache until proxy_cache_valid allows.

Repro:
curl -v http://localhost:88/url
touch 81 502 from main backend
curl -v http://localhost:88/url

Attachments (1)

nginx.conf (3.0 KB ) - added by koshikov@… 12 days ago.
config to repro

Download all attachments as: .zip

Change History (4)

by koshikov@…, 12 days ago

Attachment: nginx.conf added

config to repro

comment:1 by Maxim Dounin, 11 days ago

I don't observe behaviour you describe with the configuration provided neither in current nginx versions (such as nginx 1.19.9), nor in nginx 1.14.1 (note this is a legacy version, not supported for more than two years now).

In contrast, the following happens:

  1. First curl -v ​http://localhost:88/url request caches the response from main_backend:
    $ curl -v http://localhost:88/url
    *   Trying 127.0.0.1:88...
    * Immediate connect fail for 127.0.0.1: Connection refused
    *   Trying ::1:88...
    * Connected to localhost (::1) port 88 (#0)
    > GET /url HTTP/1.1
    > Host: localhost:88
    > User-Agent: curl/7.74.0
    > Accept: */*
    > 
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Server: nginx/1.14.1
    < Date: Tue, 30 Mar 2021 19:00:52 GMT
    < Content-Type: application/octet-stream
    < Content-Length: 8
    < Connection: keep-alive
    < Cache-Control: max-age=81
    < X-Header-Root: 88-/url
    < Cache-Control: max-age=888
    < 
    Good 81
    * Connection #0 to host localhost left intact
    
    Relevant access log entries:
    ::1 - - [30/Mar/2021:22:00:52 +0300] "GET /url HTTP/1.0" 200 8 "-" "curl/7.74.0" "-" "-" "-" "-" "-"
    ::1 - - [30/Mar/2021:22:00:52 +0300] "GET /url HTTP/1.1" 200 8 "-" "curl/7.74.0" "-" "[::1]:81" "0.000" "0.000" "200"
    
  2. Additional request, to make sure the response is properly cached:
    $ curl -v http://localhost:88/url
    *   Trying 127.0.0.1:88...
    * Immediate connect fail for 127.0.0.1: Connection refused
    *   Trying ::1:88...
    * Connected to localhost (::1) port 88 (#0)
    > GET /url HTTP/1.1
    > Host: localhost:88
    > User-Agent: curl/7.74.0
    > Accept: */*
    > 
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Server: nginx/1.14.1
    < Date: Tue, 30 Mar 2021 19:01:03 GMT
    < Content-Type: application/octet-stream
    < Content-Length: 8
    < Connection: keep-alive
    < Cache-Control: max-age=81
    < X-Header-Root: 88-/url
    < Cache-Control: max-age=888
    < 
    Good 81
    * Connection #0 to host localhost left intact
    
    Relevant access log entry:
    ::1 - - [30/Mar/2021:22:01:03 +0300] "GET /url HTTP/1.1" 200 8 "-" "curl/7.74.0" "-" "-" "-" "-" "-"
    
  3. Third request, once the main backend stops responding due to touch 81 and the cached item becomes stale, returns stale response from the cache, as expected:
    $ curl -v http://localhost:88/url
    *   Trying 127.0.0.1:88...
    * Immediate connect fail for 127.0.0.1: Connection refused
    *   Trying ::1:88...
    * Connected to localhost (::1) port 88 (#0)
    > GET /url HTTP/1.1
    > Host: localhost:88
    > User-Agent: curl/7.74.0
    > Accept: */*
    > 
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Server: nginx/1.14.1
    < Date: Tue, 30 Mar 2021 19:03:05 GMT
    < Content-Type: application/octet-stream
    < Content-Length: 8
    < Connection: keep-alive
    < Cache-Control: max-age=81
    < X-Header-Root: 88-/url
    < Cache-Control: max-age=888
    < 
    Good 81
    * Connection #0 to host localhost left intact
    
    It also tries to update the cache in the background, though immediate request to the main backend returns 502. As per the configuration, this response is cached, and the request is proxied to port 89, where is it is proxied to fallback_backend and also cached. Relevant access log entries are:
    ::1 - - [30/Mar/2021:22:03:05 +0300] "GET /url HTTP/1.0" 502 6 "-" "curl/7.74.0" "-" "-" "-" "-" "-"
    ::1 - - [30/Mar/2021:22:03:05 +0300] "GET /url HTTP/1.0" 200 8 "-" "curl/7.74.0" "-" "-" "-" "-" "-"
    ::1 - - [30/Mar/2021:22:03:05 +0300] "GET /url HTTP/1.0" 200 8 "-" "curl/7.74.0" "-" "[::1]:82" "0.000" "0.000" "200"
    ::1 - - [30/Mar/2021:22:03:05 +0300] "GET /url HTTP/1.1" 200 8 "-" "curl/7.74.0" "-" "-" "-" "-" "-"
    
  4. Forth request shows the 502 response is cached, so the request immediately goes to the backend at port 89, which returns the response from its cache:
    $ curl -v http://localhost:88/url
    *   Trying 127.0.0.1:88...
    * connect to 127.0.0.1 port 88 failed: Connection refused
    *   Trying ::1:88...
    * Connected to localhost (::1) port 88 (#0)
    > GET /url HTTP/1.1
    > Host: localhost:88
    > User-Agent: curl/7.74.0
    > Accept: */*
    > 
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Server: nginx/1.14.1
    < Date: Tue, 30 Mar 2021 19:03:11 GMT
    < Content-Type: application/octet-stream
    < Content-Length: 8
    < Connection: keep-alive
    < Cache-Control: max-age=82
    < X-Header-Root: 89-/url
    < Cache-Control: max-age=899
    < X-Header-Fall: /fallback
    < 
    Good 82
    * Connection #0 to host localhost left intact
    
    Relevant access log entries are:
    ::1 - - [30/Mar/2021:22:03:11 +0300] "GET /url HTTP/1.0" 200 8 "-" "curl/7.74.0" "-" "-" "-" "-" "-"
    ::1 - - [30/Mar/2021:22:03:11 +0300] "GET /url HTTP/1.1" 200 8 "-" "curl/7.74.0" "-" "[::]:89" "0.000" "0.000" "200"
    

This behaviour looks fully consistent with the configuration. If you think that something else should happen, please elaborate what exactly you think nginx should do, at which step, and why.

comment:2 by koshikov@…, 9 days ago

I appreciate your time, Maxim, trying to repro this - thank you. It helped to find the misconfiguration, that was the root cause of such behavior.
Prod machine didn't have proxy_intercept_errors - and that blocked subsequent request to reach fallback, and looked like cache stuck forever

::1 - - [30/Mar/2021:17:25:32 -0700] "GET /url HTTP/1.0" 502 6 "-" "curl/7.61.1" "-" "-" "-" "-" "-"
::1 - - [30/Mar/2021:17:25:32 -0700] "GET /url HTTP/1.1" 200 7 "-" "curl/7.61.1" "-" "-" "-" "-" "-"

Apologize for false report, and thanks again

comment:3 by Maxim Dounin, 9 days ago

Resolution: invalid
Status: newclosed

Thanks for the feedback, closing this.

Note: See TracTickets for help on using tickets.