#890 closed defect (invalid)
with caching enabled, nginx returns 502 bad gateway error long after upstream server comes back up
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | major | Milestone: | |
Component: | nginx-core | Version: | 1.8.x |
Keywords: | Cc: | ||
uname -a: | Linux hostedited.example.com 3.10.0-327.4.5.el7.x86_64 #1 SMP Mon Jan 25 22:07:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.8.1
built by gcc 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) built with OpenSSL 1.0.1e-fips 11 Feb 2013 TLS SNI support enabled configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-http_auth_request_module --with-mail --with-mail_ssl_module --with-file-aio --with-ipv6 --with-http_spdy_module --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' |
Description
Using nginx as a front-end proxy for an upstream apache server, I enabled caching of stale content during upstream failures:
proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504; proxy_cache_valid 200 20m; proxy_cache_valid 302 20m; proxy_cache_valid 404 20m; proxy_cache_valid any 20m;
Now consider the following sequence of events:
- The upstream aka back-end server (apache in this case) is down.
- The front-end server
nginx
is down. - We
rm -rf
the contents of thenginx
cache directory. - We start
nginx
.
At this point, since the upstream server is down and the cache is empty, anybody accessing our website gets a 502 Bad Gateway nginx/1.8.1
error from nginx
as expected.
Now the upstream server comes back up. There is no longer any need to report a bad gateway error.
But even though the upstream server is up, and our website is active on the upstream server, visitors reaching our nginx front-end continue to see the "502 Bad Gateway nginx/1.8.1
" error for at least the amount of time for which caching has been enabled — 20 minutes in the above example.
So not only is nginx
caching content that it gets from the upstream server, but it's apparently also caching its own 502 Bad Gateway nginx/1.8.1
message. And for the next 20 minutes in this case, visitors will see the 502 error. And according to the nginx error log file, it makes no attempt to contact the upstream for those 20 minutes even if there are ongoing incoming hits.
If my experiments have not misled me, any time nginx
tries to access a page not in its cache while the upstream server is down, it will cache its 502 response. So even if the upstream server was down only for briefly, website visitors may see the 502 gateway error
for much longer. In the specific case above, they will see the 502 gateway error
for 20 minutes.
I have confirmed the above behavior with with wget
and curl
and with the Chrome browser.
Also, incidentally I have enabled transmitting of the X-Cache-Status: header, and nginx does send it in the normal case. But in the above situation, when it is sending a cached 502 gateway error
, it does not send the X-Cache-Status: header. So the end-user cannot examine the headers and be able to tell that he is seeing a stale cached message.
I am using a stable nginx
package installed from the nginx
repo at http://nginx.org/packages/centos/7/x86_64/
.)
Please let me know if more information is needed.
Change History (5)
comment:1 by , 9 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 9 years ago
I can't refute your logic. However, the intent of the user here is not caching of errors, but rather caching of stale content during upstream failures. The use of the proxy_cache_use_stale
directive was intended to achieve that end. Once the upstream server comes back up, website visitors should then see the website content.
So this may be considered a feature request: that only content, not gateway errors, should be cached. When the cache does not have the requested data, the upstream server should be contacted at intervals and the cache replenished as soon as the upstream server is available again.
follow-up: 4 comment:3 by , 9 years ago
There is no real difference between "content" and "gateway errors" as long as the status code is 502 in both cases. Moreover, proxy_cache_use_stale
in the above configuration is set to use stale response (if any) in case of both errors and 502 from upstream servers, which suggests the same. I would recommend you to rethink the configuration instead - what you want it to do is likely already here, you just need to configure nginx properly to do what you want.
If you have any further question on how to configure nginx, please use support options available.
comment:4 by , 9 years ago
I am hitting this problem too, and mdounin's reply in comment3 fails to address the problem, so I would like to reword the initial problem in the hope that it is understood.
The intention of the configuration above is to only cache 200, 302 and 404 responses from the upstream, and NEVER cache 5XX responses. In case the upstream access results in a 5XX, the previous cached 200/302/404 response should be sent to the client.
The configuration in the original ticket seems correct based on the available documentation. If it is not correct, then what is the correct config to achieve the the goal of increasing website uptime from the end-user's perspective?
comment:5 by , 9 years ago
The configuration above contains
proxy_cache_valid any 20m;
which instruct nginx to cache all errors for 20 minutes. If you don't want nginx to cache errors, remove this line.
According to your configuration, all errors are cached for 20 minutes. Caching makes no difference between errors explicitly returned by upstream servers and errors happened during connection.
As for the missing header, it's likely because you've used the
add_header
directive without thealways
parameter. This directive is not expected to add anything to errors by default, see http://nginx.org/r/add_header.