Opened 6 years ago
Closed 6 years ago
Last modified 5 years ago
#1362 closed defect (wontfix)
|Reported by:||Owned by:|
|nginx -V:||n/a (from docs)|
gzip_proxied defaults to "off" in nix_http_gzip_module. This means that the Via request header will turn gzip off, which causes reverse proxies and CDNs to not receive (and cache) gzipped responses.
That, in turn, gives an incentive for these boxes to turn Via off, which means their loop detection isn't as good.
Is there a particular reason this is a default? Other servers don't do this (effectively "any").
Change History (4)
comment:1 by , 6 years ago
|Status:||new → closed|
comment:2 by , 6 years ago
HTTP/1.0 proxies can (and do) support Vary. This is a common misunderstanding about HTTP versioning; see RFC2145.
I know of no HTTP/1.0 cache that doesn't at least consider a response with Vary as uncacheable, and I've tested many -- see https://www.mnot.net/blog/2007/06/20/proxy_caching
If you're really concerned about some HTTP/1.0 device that doesn't understand Vary, you can omit Expires and set Cache-Control: max-age (since *it* wasn't defined in HTTP/1.0).
WRT cache duplication -- caches can and do perform normalisation of Accept-Encoding to avoid unnecessary duplication. RFC7234 allows this. This is common practice.
Please reconsider; you're hurting performance in other part of the Web, and forcing people to do odd workarounds.
comment:3 by , 6 years ago
While HTTP/1.0 proxies certainly can do anything they want, they are not required to support Vary, and nginx has no way to know if they support it or not. In practice, nginx itself did not till nginx 1.7.7 released in 2014, and this implies that there are no Vary support in at least 5% of all web servers known to w3techs. Testing eight modern implementation as you did in the blog post referenced - hardly counts, sorry. Not to mention that nginx also can use User-Agent-based negotiation, and this does not work even according to your blog post.
As for duplication, normalization makes things slightly less severe, but it doesn't allow to avoid cache duplication. Quick example with a small real-world log:
$ perl -nle 'm/"([^"]*)"$/ && print $1' /var/log/nginx-access.log | sort | uniq -c | sort -rn 10271 gzip,deflate 9535 gzip, deflate 8185 - 6139 deflate, gzip 5248 gzip 947 gzip,deflate,br 450 gzip, deflate, sdch 98 identity 59 gzip,deflate,sdch 44 10 deflate 5 gzip;q=1.0, deflate;q=0.8, chunked;q=0.6, identity;q=0.4, *;q=0 3 gzip, deflate, br 2 x-gzip, gzip, deflate 2 deflate;q=1.0, compress;q=0.5, gzip;q=0.5 1 gzip,deflate,bzip2,lzma,lzma2 1 gzip, deflate, x-gzip, x-deflate 1 gzip, deflate, peerdist 1 gzip, deflate, lzma, sdch
Normalizing this can reduce things to something like
$ perl -nle 'm/"([^"]*)"$/ && print join(", ", sort split /, */, $1)' /var/log/nginx-access.log | sort | uniq -c | sort -rn 26869 deflate, gzip 8452 - 5251 gzip 967 br, deflate, gzip 509 deflate, gzip, sdch 99 identity 44 10 deflate 5 *;q=0, chunked;q=0.6, deflate;q=0.8, gzip;q=1.0, identity;q=0.4 2 deflate, gzip, x-gzip 2 compress;q=0.5, deflate;q=1.0, gzip;q=0.5 1 deflate, gzip, x-deflate, x-gzip 1 deflate, gzip, peerdist 1 deflate, gzip, lzma, sdch 1 bzip2, deflate, gzip, lzma, lzma2
but hardly any further (well, a couple of lines like "identity" and "" can be avoided too). In many cases this is acceptable, as first 3 lines account for 95%+ requests, yet it is still cache duplication, and it's unavoidable with how Vary works.
Given the proportion of affected requests, we would rather preserve the current default. If needed, it is always possible to change settings to whatever needed on a particular installation, that's why
gzip_vary directives exists. Thank you for your attention.
comment:4 by , 5 years ago
Server support for Vary is irrelevant to this issue, so I'm not sure why you bring it up. Caches that are worried about duplication can (and do) take steps to mitigate it. No other major server does what nginx does, so how can the issue that you talk about be so severe?
However, back to the main issue here -- nginx is causing CDNs and other proxies to turn off the Via header in requests. We (CDNs) are starting to talk about emitting another header to avoid loops (which happens when there's a misconfiguration).
It would be nice to use Via as intended, so we don't have to do that. Failing that, it would be nice if nginx didn't decide to misuse yet another header like it has Via.
You can find detailed explanation in this thread:
TL;DR: there are no safe way to provide compressed content for proxies unless you don't care about HTTP/1.0 clients at all. And even if you don't care about HTTP/1.0 clients, the HTTP/1.1-safe way requires using
Varyand implies severe cache duplication. As such, nginx choice is to use