Opened 7 years ago

Closed 7 years ago

Last modified 6 years ago

#1362 closed defect (wontfix)

gzip_proxied default

Reported by: mnot@… Owned by:
Priority: minor Milestone:
Component: nginx-module Version: 1.13.x
Keywords: gzip Cc:
uname -a:
nginx -V: n/a (from docs)

Description

gzip_proxied defaults to "off" in nix_http_gzip_module. This means that the Via request header will turn gzip off, which causes reverse proxies and CDNs to not receive (and cache) gzipped responses.

That, in turn, gives an incentive for these boxes to turn Via off, which means their loop detection isn't as good.

Is there a particular reason this is a default? Other servers don't do this (effectively "any").

Thanks,

Change History (4)

comment:1 by Maxim Dounin, 7 years ago

Resolution: wontfix
Status: newclosed

You can find detailed explanation in this thread:

http://mailman.nginx.org/pipermail/nginx/2015-March/046996.html

TL;DR: there are no safe way to provide compressed content for proxies unless you don't care about HTTP/1.0 clients at all. And even if you don't care about HTTP/1.0 clients, the HTTP/1.1-safe way requires using Vary and implies severe cache duplication. As such, nginx choice is to use

gzip_proxied off;
gzip_vary off;

by default.

comment:2 by mnot@…, 7 years ago

HTTP/1.0 proxies can (and do) support Vary. This is a common misunderstanding about HTTP versioning; see RFC2145.

I know of no HTTP/1.0 cache that doesn't at least consider a response with Vary as uncacheable, and I've tested many -- see https://www.mnot.net/blog/2007/06/20/proxy_caching

If you're really concerned about some HTTP/1.0 device that doesn't understand Vary, you can omit Expires and set Cache-Control: max-age (since *it* wasn't defined in HTTP/1.0).

WRT cache duplication -- caches can and do perform normalisation of Accept-Encoding to avoid unnecessary duplication. RFC7234 allows this. This is common practice.

Please reconsider; you're hurting performance in other part of the Web, and forcing people to do odd workarounds.

comment:3 by Maxim Dounin, 7 years ago

While HTTP/1.0 proxies certainly can do anything they want, they are not required to support Vary, and nginx has no way to know if they support it or not. In practice, nginx itself did not till nginx 1.7.7 released in 2014, and this implies that there are no Vary support in at least 5% of all web servers known to w3techs. Testing eight modern implementation as you did in the blog post referenced - hardly counts, sorry. Not to mention that nginx also can use User-Agent-based negotiation, and this does not work even according to your blog post.

As for duplication, normalization makes things slightly less severe, but it doesn't allow to avoid cache duplication. Quick example with a small real-world log:

$ perl -nle 'm/"([^"]*)"$/ && print $1' /var/log/nginx-access.log | sort | uniq -c | sort -rn
10271 gzip,deflate
9535 gzip, deflate
8185 -
6139 deflate, gzip
5248 gzip
 947 gzip,deflate,br
 450 gzip, deflate, sdch
  98 identity
  59 gzip,deflate,sdch
  44
  10 deflate
   5 gzip;q=1.0, deflate;q=0.8, chunked;q=0.6, identity;q=0.4, *;q=0
   3 gzip, deflate, br
   2 x-gzip, gzip, deflate
   2 deflate;q=1.0, compress;q=0.5, gzip;q=0.5
   1 gzip,deflate,bzip2,lzma,lzma2
   1 gzip, deflate, x-gzip, x-deflate
   1 gzip, deflate, peerdist
   1 gzip, deflate, lzma, sdch

Normalizing this can reduce things to something like

$ perl -nle 'm/"([^"]*)"$/ && print join(", ", sort split /, */, $1)' /var/log/nginx-access.log | sort | uniq -c | sort -rn 
26869 deflate, gzip
8452 -
5251 gzip
 967 br, deflate, gzip
 509 deflate, gzip, sdch
  99 identity
  44
  10 deflate
   5 *;q=0, chunked;q=0.6, deflate;q=0.8, gzip;q=1.0, identity;q=0.4
   2 deflate, gzip, x-gzip
   2 compress;q=0.5, deflate;q=1.0, gzip;q=0.5
   1 deflate, gzip, x-deflate, x-gzip
   1 deflate, gzip, peerdist
   1 deflate, gzip, lzma, sdch
   1 bzip2, deflate, gzip, lzma, lzma2

but hardly any further (well, a couple of lines like "identity" and "" can be avoided too). In many cases this is acceptable, as first 3 lines account for 95%+ requests, yet it is still cache duplication, and it's unavoidable with how Vary works.

Given the proportion of affected requests, we would rather preserve the current default. If needed, it is always possible to change settings to whatever needed on a particular installation, that's why gzip_proxied and gzip_vary directives exists. Thank you for your attention.

comment:4 by mnot@…, 6 years ago

Server support for Vary is irrelevant to this issue, so I'm not sure why you bring it up. Caches that are worried about duplication can (and do) take steps to mitigate it. No other major server does what nginx does, so how can the issue that you talk about be so severe?

However, back to the main issue here -- nginx is causing CDNs and other proxies to turn off the Via header in requests. We (CDNs) are starting to talk about emitting another header to avoid loops (which happens when there's a misconfiguration).

It would be nice to use Via as intended, so we don't have to do that. Failing that, it would be nice if nginx didn't decide to misuse yet another header like it has Via.

Note: See TracTickets for help on using tickets.