Changing vary response header based on varied-opub request hedares causes cache misses
|Reported by:||Neil Craig||Owned by:|
|uname -a:||Linux ip-10-13-149-100.eu-west-1.compute.internal 3.10.0-327.10.1.el7.x86_64 #1 SMP Tue Feb 16 17:03:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux|
nginx version: nginx/1.9.14 (BBC GTM)
built with OpenSSL 1.1.0-pre4 (beta) 16 Mar 2016
TLS SNI support enabled
configure arguments: --build='BBC GTM' --prefix=/usr/local/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/current/nginx.conf --pid-path=/var/run/nginx.pid --error-log-path=/var/log/nginx/default-error.log --http-log-path=/var/log/nginx/default-access.log --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=gtmdaemon --group=gtmdaemon --with-http_realip_module --with-http_v2_module --with-http_ssl_module --with-http_geoip_module --with-pcre-jit --with-ipv6 --with-file-aio --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' --add-module=/tmp/tmprOIlsh/BUILD/nginx-1.9.14/headers-more-nginx-module --add-module=/tmp/tmprOIlsh/BUILD/nginx-1.9.14/naxsi/naxsi_src --add-module=/tmp/tmprOIlsh/BUILD/nginx-1.9.14/nginx-module-vts --add-module=/tmp/tmprOIlsh/BUILD/nginx-1.9.14/nginx-upstream-dynamic-servers --with-openssl=/tmp/tmprOIlsh/BUILD/nginx-1.9.14/openssl-1.1.0-pre4
I'm building a traffic management platform which is based on NGINX and as part of this, am doing some due diligence on NGINX features in our contexts, to make sure we understand the functionality. As part of this, I have been testing vary support and have found an issue which I wanted to raise with you to see if there's any scope for optimisation.
NGINX configured as an HTTP/S/2 caching reverse proxy, using proxy_pass
NGINX patched to increase vary header max length to 8192 bytes (42 bytes is way too short for us)
Origin (proxied-to) server returns a vary response header which varies based on some request header values
Origin issues cacheable objects (e.g. HTML) with standards-compliant response headers e.g. Cache=Control:public,max-age=600 etc.
Origin issues e.g. 2 different vary headers, depending on the value of a varied-upon request header e.g. h1:
when h1 == "a", vary response header is e.g. h1,h2
when h2 == "b", vary response header is e.g. h1,h2,h3
Make requests via NGINX with e.h. h1 == "a", h2 == "b" to cache the response - vary header issued will be "vary: h1,h2" and response will be cached
Make requests via NGINX with e.h. h1 == "b", h2 == "b" to cache the response - vary header issued will be "vary: h1,h2,h3" and response will be cached
Again make requests via NGINX with e.h. h1 == "a", h2 == "b" to cache the response - vary header issued will be "vary: h1,h2" - response will not be taken from cache, even if the max-age issued means that it could have been
i.e. if we change the vary response header based on the value of a varied-upon request header, NGINX will not serve previously cached items - thus the cache is partially invalidated.
From doing some reading and examining the cache files on disk, i believe NGINX sees a vary-mismatch when the vary response header changes and overwrites the master cache object on disk. This means that we see a partially/completely invalidated cache when the vary response header changes and thus we see a much lower cache hit ratio/efficiency.
We change the vary response header based on the value of several of our common request headers e.g. the users geographical location (due to rights issues or personalisation e.g. weather forecasts, news, travel info etc.), an "are we behind a CDN" flag and several more scenarios. This means that we are going to see much lower, potentially zero, cache hit ratio - this is quite a problem and I don't think we're alone.
I have checked the most recent RFC which describes vary handling and it doesn't really cover this scenario as you're probably aware - so i don't see this as a problem in not adhering to standards, it's a suggestion to help improve cache hit ratio for users who operate similarly to how we do.
I hope i have explained this clearly but please let me know if not or indeed if you spot any error in my logic - it has been peer reviewed by my colleague so i believe it's sound. I can supply example config if required though i'd prefer to keep that private for the moment as it's somewhat sensitive currently.