#955 closed enhancement (wontfix)
Changing vary response header based on varied-opub request hedares causes cache misses
Reported by: | Neil Craig | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | nginx-core | Version: | 1.9.x |
Keywords: | cache vary | Cc: | |
uname -a: | Linux ip-10-13-149-100.eu-west-1.compute.internal 3.10.0-327.10.1.el7.x86_64 #1 SMP Tue Feb 16 17:03:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.9.14 (BBC GTM)
built with OpenSSL 1.1.0-pre4 (beta) 16 Mar 2016 TLS SNI support enabled configure arguments: --build='BBC GTM' --prefix=/usr/local/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/current/nginx.conf --pid-path=/var/run/nginx.pid --error-log-path=/var/log/nginx/default-error.log --http-log-path=/var/log/nginx/default-access.log --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=gtmdaemon --group=gtmdaemon --with-http_realip_module --with-http_v2_module --with-http_ssl_module --with-http_geoip_module --with-pcre-jit --with-ipv6 --with-file-aio --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' --add-module=/tmp/tmprOIlsh/BUILD/nginx-1.9.14/headers-more-nginx-module --add-module=/tmp/tmprOIlsh/BUILD/nginx-1.9.14/naxsi/naxsi_src --add-module=/tmp/tmprOIlsh/BUILD/nginx-1.9.14/nginx-module-vts --add-module=/tmp/tmprOIlsh/BUILD/nginx-1.9.14/nginx-upstream-dynamic-servers --with-openssl=/tmp/tmprOIlsh/BUILD/nginx-1.9.14/openssl-1.1.0-pre4 |
Description
Hi
I'm building a traffic management platform which is based on NGINX and as part of this, am doing some due diligence on NGINX features in our contexts, to make sure we understand the functionality. As part of this, I have been testing vary support and have found an issue which I wanted to raise with you to see if there's any scope for optimisation.
Test configuration:
NGINX configured as an HTTP/S/2 caching reverse proxy, using proxy_pass
NGINX patched to increase vary header max length to 8192 bytes (42 bytes is way too short for us)
Origin (proxied-to) server returns a vary response header which varies based on some request header values
Origin issues cacheable objects (e.g. HTML) with standards-compliant response headers e.g. Cache=Control:public,max-age=600 etc.
Origin issues e.g. 2 different vary headers, depending on the value of a varied-upon request header e.g. h1:
when h1 == "a", vary response header is e.g. h1,h2
when h2 == "b", vary response header is e.g. h1,h2,h3
Test sequence:
Make requests via NGINX with e.h. h1 == "a", h2 == "b" to cache the response - vary header issued will be "vary: h1,h2" and response will be cached
Make requests via NGINX with e.h. h1 == "b", h2 == "b" to cache the response - vary header issued will be "vary: h1,h2,h3" and response will be cached
Again make requests via NGINX with e.h. h1 == "a", h2 == "b" to cache the response - vary header issued will be "vary: h1,h2" - response will not be taken from cache, even if the max-age issued means that it could have been
i.e. if we change the vary response header based on the value of a varied-upon request header, NGINX will not serve previously cached items - thus the cache is partially invalidated.
From doing some reading and examining the cache files on disk, i believe NGINX sees a vary-mismatch when the vary response header changes and overwrites the master cache object on disk. This means that we see a partially/completely invalidated cache when the vary response header changes and thus we see a much lower cache hit ratio/efficiency.
We change the vary response header based on the value of several of our common request headers e.g. the users geographical location (due to rights issues or personalisation e.g. weather forecasts, news, travel info etc.), an "are we behind a CDN" flag and several more scenarios. This means that we are going to see much lower, potentially zero, cache hit ratio - this is quite a problem and I don't think we're alone.
I have checked the most recent RFC which describes vary handling and it doesn't really cover this scenario as you're probably aware - so i don't see this as a problem in not adhering to standards, it's a suggestion to help improve cache hit ratio for users who operate similarly to how we do.
I hope i have explained this clearly but please let me know if not or indeed if you spot any error in my logic - it has been peer reviewed by my colleague so i believe it's sound. I can supply example config if required though i'd prefer to keep that private for the moment as it's somewhat sensitive currently.
Many thanks
Neil
Change History (3)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
Component: | documentation → nginx-core |
---|---|
Resolution: | → wontfix |
Status: | new → closed |
Caching of multiple response variants in nginx works as follows:
- it looks up a cached response stored for a given key ("main variant");
- if this response contains
Vary
, a secondary key is calculated based on headers of the request as listed inVary
; - then it looks up a cached response for the secondary key.
This approach allows to effectively cache resources with "flat" set of variants, using the same Vary
response header in all variants. At most, only one extra cache lookup is required to find out a response variant to return.
It also has some obvious limitations though - it won't work for a "tree-like" set of variants you are trying to use, as it is critical for the algorithm to work that Vary
is the same for all variants. Therefore if nginx detects that a response is returned with different Vary
header, it assumes that variant selection logic was changed on a backend (e.g., gzip compression was switched on or off) and replaces the main variant with the response returned.
There are no plans to change caching logic - it is believed to be good enough for most practical cases, and changing it to allow arbitrary sets with different Vary
headers in different variants will require inspecting all cached variants, thus reducing cache performance. You may want to consider returning identical Vary
headers in all variants instead, or use distinct keys instead of variants.
comment:3 by , 9 years ago
Thanks Max. I understand your position on this - maybe our use-case is unusual and I am hoping we can get around the issue via cache keys, i was considering that already.
I might write a blog post explaining this, assuming you don't mind - just to get the info out there as there isn't much currently and i think it's useful to know for planning.
Cheers
Ah, sorry, I just realised I accidentally assigned this to docs - it should be in core. Also i have typos in the description - sorry, not enough coffee!