Opened 6 years ago

Closed 5 years ago

#377 closed defect (fixed)

etag не отдается с gzip

Reported by: www.google.com/accounts/o8/id?id=AItOawnDoXTWklt8FDnqYvidSu3SujhMolefNFE Owned by:
Priority: trivial Milestone:
Component: nginx-module Version:
Keywords: etag, gzip Cc:
uname -a: Linux host-1 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.4.1 TLS SNI support enabled configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-mail --with-mail_ssl_module --with-file-aio --with-http_spdy_module --with-ipv6

Description

Обнаружил странное поведение при одновременно включенных директивах gzip и etag для отдачи статики.

Предположим, что мы уже делали запрос к ресурсу и он закеширован в браузере. После этого перевыкатываем изменения файлов на сервере. Далее делаем повторный запрос к ресурсу:

GET /css/reset.css HTTP/1.1
Host: example.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/css,*/*;q=0.1
If-None-Match: "51bca012-41b"
If-Modified-Since: Sat, 15 Jun 2013 17:10:42 GMT
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36
Referer: http://example.com/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4
Cookie: _ym_visorc=w
HTTP/1.1 200 OK
Server: nginx/1.4.1
Date: Sat, 15 Jun 2013 17:11:37 GMT
Content-Type: text/css
Last-Modified: Sat, 15 Jun 2013 17:11:34 GMT
Transfer-Encoding: chunked
Connection: keep-alive
Content-Encoding: gzip

Как видно, здесь нету заговка ETag. После этого обновляем страницу еще раз и видим вот такое:

GET /css/reset.css HTTP/1.1
Host: example.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/css,*/*;q=0.1
If-None-Match: "51bca012-41b"
If-Modified-Since: Sat, 15 Jun 2013 17:11:34 GMT
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36
Referer: http://example.com/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4
Cookie: _ym_visorc=w
HTTP/1.1 200 OK
Server: nginx/1.4.1
Date: Sat, 15 Jun 2013 17:11:53 GMT
Content-Type: text/css
Last-Modified: Sat, 15 Jun 2013 17:11:34 GMT
Transfer-Encoding: chunked
Connection: keep-alive
Content-Encoding: gzip

Т.е. браузер посылает дату модификации, полученную в ответ на первый запрос и etag, который остался еще от прежней версии файла (до выкатки изменений). NGINX видит, что первое условие выполняется, а второе нет, и отдает файл снова целиком. Если обновить страницу еще несколько раз, то увидим ту же самую картину.

Я не знаю, можно ли это назвать багом, но приведенный пример довольно типичный и отдавать 200 и весь файл целиком нехорошо.

Почему бы не отдавать ETag вместе с включенным сжатием? Это позволило бы устранить данную проблему.

P.S. А вот если перевыкатки на сервере не делать, то NGINX нормально отдает 304, причем вместе с заголовком ETag.

Change History (14)

comment:1 Changed 6 years ago by mdounin

При использовании gzip - содержимое ответа меняется, и strong entity tag исходного ответа уже не может быть использован, иначе будут проблемы при byte-range запросах. Соответственно сейчас заголовок ETag при изменение ответа просто убирается (как gzip-фильтром, так и другими фильтрами, меняющими ответ, e.g. ssi).

Интересно, на что рассчитывает Chrome, используя ETag ответа, который гарантированно устарел (ему уже вернули новый ответ). RFC2616 как бы говорит нам:

If none of the entity tags match, then the server MAY perform the
requested method as if the If-None-Match header field did not exist,
but MUST also ignore any If-Modified-Since header field(s) in the
request. That is, if no entity tags match, then the server MUST NOT
return a 304 (Not Modified) response.

Т.е. 304 в описанной ситуации возвращён быть не может, никогда. Возможно, имеет смысл сообщить об этой проблеме разработчикам Chrome'а.

Вот что выглядит ошибкой - это возврат ETag'а для 304-го ответа при включённом gzip. Надо подумать, что с этим можно сделать...

comment:2 Changed 6 years ago by www.google.com/accounts/o8/id?id=AItOawnLc8uS4sXh28Ic70rMEO65pRSIjGV3Sqc

Hello, this is a major issue for us. It removes all of our Etags and causes significant increases to page load times for our Rails app, and is probably affecting other Rails users as well. Weak ETags would resolve these issues for us. Please implement weak ETags!

comment:3 Changed 6 years ago by mdounin

What stops you from using Last-Modified as a cache validator instead?

comment:4 Changed 6 years ago by www.google.com/accounts/o8/id?id=AItOawnLc8uS4sXh28Ic70rMEO65pRSIjGV3Sqc

Last-Modified is not universal enough to be used for automated caching (we generate an ETag for every page we serve.) Our data has non-trivial caching logic that cannot be reduced to a simple date stamp.

comment:5 Changed 6 years ago by www.google.com/accounts/o8/id?id=AItOawmaYGT8OfcAhiRD5ywUfaSC5pswJp2oOGM

Here is a patch for it, can you apply it ?

--- nginx-1.3.8/src/http/modules/ngx_http_gzip_filter_module.c
2012-07-07 17:22:27.000000000 -0400
+++ nginx-1.3.8-weak-etags-shorter/src/http/modules/ngx_http_gzip_filter_module.c2012-11-21
17:05:12.758389000 -0500
@@ -306,7 +306,15 @@

ngx_http_clear_content_length(r);
ngx_http_clear_accept_ranges(r);

  • ngx_http_clear_etag(r);

+
+ /* Clear etags unless they're marked as weak (prefixed with 'W/') */
+ h = r->headers_out.etag;
+ if (h && !(h->value.len >= 3 &&
+ h->value.data[0] == 'W' &&
+ h->value.data[1] == '/' &&
+ h->value.data[2] == '"')) {
+ ngx_http_clear_etag(r);
+ }

return ngx_http_next_header_filter(r);
}

comment:6 Changed 6 years ago by mdounin

This patch looks like a dirty hack. And please see Contributing Changes for a recommended way to submit patches.

comment:7 Changed 6 years ago by www.google.com/accounts/o8/id?id=AItOawmaYGT8OfcAhiRD5ywUfaSC5pswJp2oOGM

I'm not good at C, it's some really simple logic, could you clean it up and do the patch ?

comment:8 Changed 6 years ago by www.google.com/accounts/o8/id?id=AItOawmaYGT8OfcAhiRD5ywUfaSC5pswJp2oOGM

How about:

--- a/src/http/modules/ngx_http_gzip_filter_module.c Mon Oct 21 18:20:32 2013 +0800
+++ b/src/http/modules/ngx_http_gzip_filter_module.c Mon Oct 21 10:18:00 2013 -0700
@@ -306,7 +306,11 @@

ngx_http_clear_content_length(r);
ngx_http_clear_accept_ranges(r);

  •    ngx_http_clear_etag(r);

+
+    h = r->headers_out.etag;
+    if(h && ngx_strncmp(h->value.data, "W
", 2) != 0) {
+        ngx_http_clear_etag(r);
+    }

return ngx_http_next_header_filter(r);

}

comment:9 Changed 6 years ago by grosser.michael@…

It looks like apache is just appending -gzip to the original etag to make this process transparent/simple,
can we do something similar ?

comment:10 Changed 6 years ago by mdounin

No. As far as I understand, Apache approach is actually wrong as resulting entities can be different but will be served with identical strict etags (e.g., with different compression level), and this directly contradicts RFC 2616. Correct approach would probably be to downgrade strict entity tags to weak ones.

(See also http://mailman.nginx.org/pipermail/nginx-devel/2013-November/004523.html for a patch which prevents clearing of weak etags by gzip module and friends. It's unlikely to be committed though.)

comment:11 Changed 6 years ago by www.google.com/accounts/o8/id?id=AItOawm13Lv3zdS8a1w3BqHSX_tbZQ3xhO0NLig

Version 2, edited 6 years ago by www.google.com/accounts/o8/id?id=AItOawm13Lv3zdS8a1w3BqHSX_tbZQ3xhO0NLig (previous) (next) (diff)

comment:12 Changed 6 years ago by grosser.michael@…

Would be great if we could actually do something about it, instead of worrying about being RFC blah blah compliant. If we cannot use etags that just means we use apache because they actually fixed it instead of burying their head in the sand. We saw 40% response time drop by implementing weak etag ourselves, something that cannot be done with last-modified.

Here is the patch for weak etag support:
https://github.com/grosser/puppet-nginx/blob/grosser/weak-etag/files/brews/nginx_weak_etag.patch

comment:13 Changed 5 years ago by openid.yandex.ru/guseynov-alexey

I'd like to see this problem solved too. Sometimes it is hard to implement "Last-Modified" header for dinamic content because you have to remember when that content was modified. In such situation etags are very handy. And weak etags would work in such situation quite well.

comment:14 Changed 5 years ago by mdounin

  • Resolution set to fixed
  • Status changed from new to closed

The e491b26fa5a1 change introduces downgrade of strong etags to weak ones, and af229f8cf987 implements weak comparison for If-None-Match.

Note: See TracTickets for help on using tickets.