Opened 11 years ago

Closed 10 years ago

#377 closed defect (fixed)

etag не отдается с gzip

Reported by: Владимир Андреев Owned by:
Priority: trivial Milestone:
Component: nginx-module Version:
Keywords: etag, gzip Cc:
uname -a: Linux host-1 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.4.1
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-mail --with-mail_ssl_module --with-file-aio --with-http_spdy_module --with-ipv6

Description

Обнаружил странное поведение при одновременно включенных директивах gzip и etag для отдачи статики.

Предположим, что мы уже делали запрос к ресурсу и он закеширован в браузере. После этого перевыкатываем изменения файлов на сервере. Далее делаем повторный запрос к ресурсу:

GET /css/reset.css HTTP/1.1
Host: example.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/css,*/*;q=0.1
If-None-Match: "51bca012-41b"
If-Modified-Since: Sat, 15 Jun 2013 17:10:42 GMT
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36
Referer: http://example.com/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4
Cookie: _ym_visorc=w
HTTP/1.1 200 OK
Server: nginx/1.4.1
Date: Sat, 15 Jun 2013 17:11:37 GMT
Content-Type: text/css
Last-Modified: Sat, 15 Jun 2013 17:11:34 GMT
Transfer-Encoding: chunked
Connection: keep-alive
Content-Encoding: gzip

Как видно, здесь нету заговка ETag. После этого обновляем страницу еще раз и видим вот такое:

GET /css/reset.css HTTP/1.1
Host: example.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/css,*/*;q=0.1
If-None-Match: "51bca012-41b"
If-Modified-Since: Sat, 15 Jun 2013 17:11:34 GMT
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36
Referer: http://example.com/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4
Cookie: _ym_visorc=w
HTTP/1.1 200 OK
Server: nginx/1.4.1
Date: Sat, 15 Jun 2013 17:11:53 GMT
Content-Type: text/css
Last-Modified: Sat, 15 Jun 2013 17:11:34 GMT
Transfer-Encoding: chunked
Connection: keep-alive
Content-Encoding: gzip

Т.е. браузер посылает дату модификации, полученную в ответ на первый запрос и etag, который остался еще от прежней версии файла (до выкатки изменений). NGINX видит, что первое условие выполняется, а второе нет, и отдает файл снова целиком. Если обновить страницу еще несколько раз, то увидим ту же самую картину.

Я не знаю, можно ли это назвать багом, но приведенный пример довольно типичный и отдавать 200 и весь файл целиком нехорошо.

Почему бы не отдавать ETag вместе с включенным сжатием? Это позволило бы устранить данную проблему.

P.S. А вот если перевыкатки на сервере не делать, то NGINX нормально отдает 304, причем вместе с заголовком ETag.

Change History (14)

comment:1 by Maxim Dounin, 11 years ago

При использовании gzip - содержимое ответа меняется, и strong entity tag исходного ответа уже не может быть использован, иначе будут проблемы при byte-range запросах. Соответственно сейчас заголовок ETag при изменение ответа просто убирается (как gzip-фильтром, так и другими фильтрами, меняющими ответ, e.g. ssi).

Интересно, на что рассчитывает Chrome, используя ETag ответа, который гарантированно устарел (ему уже вернули новый ответ). RFC2616 как бы говорит нам:

If none of the entity tags match, then the server MAY perform the
requested method as if the If-None-Match header field did not exist,
but MUST also ignore any If-Modified-Since header field(s) in the
request. That is, if no entity tags match, then the server MUST NOT
return a 304 (Not Modified) response.

Т.е. 304 в описанной ситуации возвращён быть не может, никогда. Возможно, имеет смысл сообщить об этой проблеме разработчикам Chrome'а.

Вот что выглядит ошибкой - это возврат ETag'а для 304-го ответа при включённом gzip. Надо подумать, что с этим можно сделать...

comment:2 by Aaron Peschel, 11 years ago

Hello, this is a major issue for us. It removes all of our Etags and causes significant increases to page load times for our Rails app, and is probably affecting other Rails users as well. Weak ETags would resolve these issues for us. Please implement weak ETags!

comment:3 by Maxim Dounin, 11 years ago

What stops you from using Last-Modified as a cache validator instead?

comment:4 by Aaron Peschel, 11 years ago

Last-Modified is not universal enough to be used for automated caching (we generate an ETag for every page we serve.) Our data has non-trivial caching logic that cannot be reduced to a simple date stamp.

comment:5 by Michael Grosser, 11 years ago

Here is a patch for it, can you apply it ?

--- nginx-1.3.8/src/http/modules/ngx_http_gzip_filter_module.c
2012-07-07 17:22:27.000000000 -0400
+++ nginx-1.3.8-weak-etags-shorter/src/http/modules/ngx_http_gzip_filter_module.c2012-11-21
17:05:12.758389000 -0500
@@ -306,7 +306,15 @@

ngx_http_clear_content_length(r);
ngx_http_clear_accept_ranges(r);

  • ngx_http_clear_etag(r);

+
+ /* Clear etags unless they're marked as weak (prefixed with 'W/') */
+ h = r->headers_out.etag;
+ if (h && !(h->value.len >= 3 &&
+ h->value.data[0] == 'W' &&
+ h->value.data[1] == '/' &&
+ h->value.data[2] == '"')) {
+ ngx_http_clear_etag(r);
+ }

return ngx_http_next_header_filter(r);
}

comment:6 by Maxim Dounin, 11 years ago

This patch looks like a dirty hack. And please see Contributing Changes for a recommended way to submit patches.

comment:7 by Michael Grosser, 11 years ago

I'm not good at C, it's some really simple logic, could you clean it up and do the patch ?

comment:8 by Michael Grosser, 11 years ago

How about:

--- a/src/http/modules/ngx_http_gzip_filter_module.c Mon Oct 21 18:20:32 2013 +0800
+++ b/src/http/modules/ngx_http_gzip_filter_module.c Mon Oct 21 10:18:00 2013 -0700
@@ -306,7 +306,11 @@

ngx_http_clear_content_length(r);

ngx_http_clear_accept_ranges(r);

  •    ngx_http_clear_etag(r);

+
+    h = r->headers_out.etag;
+    if(h && ngx_strncmp(h->value.data, "W
", 2) != 0) {
+        ngx_http_clear_etag(r);
+    }

return ngx_http_next_header_filter(r);

}

comment:9 by Michael Grosser, 11 years ago

It looks like apache is just appending -gzip to the original etag to make this process transparent/simple,
can we do something similar ?

comment:10 by Maxim Dounin, 11 years ago

No. As far as I understand, Apache approach is actually wrong as resulting entities can be different but will be served with identical strict etags (e.g., with different compression level), and this directly contradicts RFC 2616. Correct approach would probably be to downgrade strict entity tags to weak ones.

(See also http://mailman.nginx.org/pipermail/nginx-devel/2013-November/004523.html for a patch which prevents clearing of weak etags by gzip module and friends. It's unlikely to be committed though.)

comment:11 by Michael Fischer, 11 years ago

I respectfully disagree; I think the Apache solution is adequate. The debate regarding weak vs. strong ETags was exhaustively argued by the Apache team in https://issues.apache.org/bugzilla/show_bug.cgi?id=39727, and their resulting decision to append the -gzip suffix has not caused any reported problems. The fact that there was so much discussion over it suggests the RFC is not completely clear on the subject. Ultimately that team had to make a decision and it has not proved to be a bad one (i.e. it hasn't caused problems with intermediary caching).

The fact of the matter is that any difference in the compressed bytestream is irrelevant to the client for the purpose of generating If-None-Match requests; it is the uncompressed representation that the client cares about.

Given that there is already one HTTP reference implementation that has behaved this way without ill effect, I think it's a strong argument for nginx to behave this way as well unless it can be demonstrated that it will actually cause real-world problems.

Version 1, edited 11 years ago by Michael Fischer (previous) (next) (diff)

comment:12 by Michael Grosser, 11 years ago

Would be great if we could actually do something about it, instead of worrying about being RFC blah blah compliant. If we cannot use etags that just means we use apache because they actually fixed it instead of burying their head in the sand. We saw 40% response time drop by implementing weak etag ourselves, something that cannot be done with last-modified.

Here is the patch for weak etag support:
https://github.com/grosser/puppet-nginx/blob/grosser/weak-etag/files/brews/nginx_weak_etag.patch

comment:13 by openid.yandex.ru/guseynov-alexey, 10 years ago

I'd like to see this problem solved too. Sometimes it is hard to implement "Last-Modified" header for dinamic content because you have to remember when that content was modified. In such situation etags are very handy. And weak etags would work in such situation quite well.

comment:14 by Maxim Dounin, 10 years ago

Resolution: fixed
Status: newclosed

The e491b26fa5a1 change introduces downgrade of strong etags to weak ones, and af229f8cf987 implements weak comparison for If-None-Match.

Note: See TracTickets for help on using tickets.