Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#890 closed defect (invalid)

with caching enabled, nginx returns 502 bad gateway error long after upstream server comes back up

Reported by: fortran77@… Owned by:
Priority: major Milestone:
Component: nginx-core Version: 1.8.x
Keywords: Cc:
uname -a: Linux hostedited.example.com 3.10.0-327.4.5.el7.x86_64 #1 SMP Mon Jan 25 22:07:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.8.1
built by gcc 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC)
built with OpenSSL 1.0.1e-fips 11 Feb 2013
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-http_auth_request_module --with-mail --with-mail_ssl_module --with-file-aio --with-ipv6 --with-http_spdy_module --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic'

Description

Using nginx as a front-end proxy for an upstream apache server, I enabled caching of stale content during upstream failures:

  proxy_cache_use_stale         error timeout invalid_header updating http_500 http_502 http_503 http_504;
  proxy_cache_valid             200 20m;
  proxy_cache_valid             302 20m;
  proxy_cache_valid             404 20m;
  proxy_cache_valid             any 20m;

Now consider the following sequence of events:

  1. The upstream aka back-end server (apache in this case) is down.
  2. The front-end server nginx is down.
  3. We rm -rf the contents of the nginx cache directory.
  4. We start nginx.

At this point, since the upstream server is down and the cache is empty, anybody accessing our website gets a 502 Bad Gateway nginx/1.8.1 error from nginx as expected.

Now the upstream server comes back up. There is no longer any need to report a bad gateway error.

But even though the upstream server is up, and our website is active on the upstream server, visitors reaching our nginx front-end continue to see the "502 Bad Gateway nginx/1.8.1" error for at least the amount of time for which caching has been enabled — 20 minutes in the above example.

So not only is nginx caching content that it gets from the upstream server, but it's apparently also caching its own 502 Bad Gateway nginx/1.8.1 message. And for the next 20 minutes in this case, visitors will see the 502 error. And according to the nginx error log file, it makes no attempt to contact the upstream for those 20 minutes even if there are ongoing incoming hits.

If my experiments have not misled me, any time nginx tries to access a page not in its cache while the upstream server is down, it will cache its 502 response. So even if the upstream server was down only for briefly, website visitors may see the 502 gateway error for much longer. In the specific case above, they will see the 502 gateway error for 20 minutes.

I have confirmed the above behavior with with wget and curl and with the Chrome browser.

Also, incidentally I have enabled transmitting of the X-Cache-Status: header, and nginx does send it in the normal case. But in the above situation, when it is sending a cached 502 gateway error, it does not send the X-Cache-Status: header. So the end-user cannot examine the headers and be able to tell that he is seeing a stale cached message.

I am using a stable nginx package installed from the nginx repo at http://nginx.org/packages/centos/7/x86_64/.)

Please let me know if more information is needed.

Change History (5)

comment:1 by Maxim Dounin, 8 years ago

Resolution: invalid
Status: newclosed

According to your configuration, all errors are cached for 20 minutes. Caching makes no difference between errors explicitly returned by upstream servers and errors happened during connection.

As for the missing header, it's likely because you've used the add_header directive without the always parameter. This directive is not expected to add anything to errors by default, see http://nginx.org/r/add_header.

comment:2 by fortran77@…, 8 years ago

I can't refute your logic. However, the intent of the user here is not caching of errors, but rather caching of stale content during upstream failures. The use of the proxy_cache_use_stale directive was intended to achieve that end. Once the upstream server comes back up, website visitors should then see the website content.

So this may be considered a feature request: that only content, not gateway errors, should be cached. When the cache does not have the requested data, the upstream server should be contacted at intervals and the cache replenished as soon as the upstream server is available again.

comment:3 by Maxim Dounin, 8 years ago

There is no real difference between "content" and "gateway errors" as long as the status code is 502 in both cases. Moreover, proxy_cache_use_stale in the above configuration is set to use stale response (if any) in case of both errors and 502 from upstream servers, which suggests the same. I would recommend you to rethink the configuration instead - what you want it to do is likely already here, you just need to configure nginx properly to do what you want.

If you have any further question on how to configure nginx, please use support options available.

in reply to:  3 comment:4 by drok@…, 8 years ago

I am hitting this problem too, and mdounin's reply in comment3 fails to address the problem, so I would like to reword the initial problem in the hope that it is understood.

The intention of the configuration above is to only cache 200, 302 and 404 responses from the upstream, and NEVER cache 5XX responses. In case the upstream access results in a 5XX, the previous cached 200/302/404 response should be sent to the client.

The configuration in the original ticket seems correct based on the available documentation. If it is not correct, then what is the correct config to achieve the the goal of increasing website uptime from the end-user's perspective?

comment:5 by Maxim Dounin, 8 years ago

The configuration above contains

proxy_cache_valid             any 20m;

which instruct nginx to cache all errors for 20 minutes. If you don't want nginx to cache errors, remove this line.

Note: See TracTickets for help on using tickets.