Opened 3 years ago
Closed 3 years ago
#2155 closed defect (fixed)
Stale
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | critical | Milestone: | |
Component: | nginx-core | Version: | 1.19.x |
Keywords: | nginx stale requests | Cc: | |
uname -a: | FreeBSD freebsd12.build.ihead.ru 12.2-RELEASE-p3 FreeBSD 12.2-RELEASE-p3 GENERIC amd64 | ||
nginx -V: |
nginx version: nginx/1.19.8
built by clang 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2) built with OpenSSL 1.1.1k 25 Mar 2021 TLS SNI support enabled configure arguments: --with-http_stub_status_module --with-http_flv_module --without-http_empty_gif_module --without-http_memcached_module --without-http_upstream_ip_hash_module --without-http_browser_module --with-http_ssl_module --without-http_uwsgi_module --without-http_scgi_module --with-openssl=../openssl-1.1.1k --with-http_v2_module --with-pcre-jit --with-http_auth_request_module --with-file-aio --with-http_realip_module |
Description (last modified by )
Hi!
Problem appeared after upgrade from 1.19.6 to 1.19.8.
server (site) with https (http2).
Site with big number of images per page.
Some images are not loaded on control+F5.
Problem is reproduced in Google Chrome and other browsers based on Chromium.
F12 - Network tab -> Problem requests are marked red with status "Stale".
Problem not present in 1.19.6.
Attachments (1)
Change History (12)
comment:1 by , 3 years ago
Description: | modified (diff) |
---|
comment:2 by , 3 years ago
comment:3 by , 3 years ago
Yes, problem is reproduced in 1.19.6 with
http2_max_requests 100;
And, yes, connection is "Stalled", not "Stale" (in "Timing" tab).
by , 3 years ago
Attachment: | stalled.png added |
---|
comment:4 by , 3 years ago
Thanks for additional testing and for clarification. Indeed, in Timing tab only "Stalled" timing is shown for such failed requests. Yet these requests are shown as "(failed)" in the Status column of the table at the Network tab itself. Unfortunately, it doesn't help to understand why Chrome fails to properly re-request such resources as it should as per RFC 7540. This probably needs to be reported to Chrome.
Just for the record, previously the most common source of errors when hitting http2_max_requests
was lack of the lingering close in nginx HTTP/2 code, which resulted in RST packets being sent (ticket #1250). This was fixed in nginx 1.19.1, and no longer an issue (to make sure I've double-checked tcpdump, there are no RST packets, only normal connection close with FINs from both sides).
The same test with keepalive_requests 100;
and 500 images works fine with Firefox. On the other hand, Firefox seems to be limited by network.http.request.max-attempts (defaults to 10), and extreme configurations (where "keepalive_requests" is set so more than 10 HTTP/2 connections are needed to load all the resources) might cause Firefox to fail loading some resources as well. Further, any attempt to use an HTTP/2 connection to load a resource (even if nothing actually happens but the request is waiting due to max_concurrent_streams limit) seems to be counted towards the limit (tested with a page with 5k images, large enough keepalive_requests and a local patch to update SETTINGS_MAX_CONCURRENT_STREAMS after each request to make sure no extra requests need to be resent after GOAWAY when keepalive_requests limit is reached, as well as network.http.spdy.default-concurrent (defaults to 100) set to 1 in Firefox and http2_max_concurrent_streams 1;
in nginx to do the same; some resources are not loaded unless keepalive_requests is 500 or more). This might actually be the same problem, though handled slightly better, with a limit of 10 "attempts" (not really, see above) by default instead of just two in Chrome.
As previously suggested, most simple workaround would be to use large enough keepalive_requests
value if your site needs to load many resources. Another possible workaround would be to disable HTTP/2 till browsers are fixed to properly handle GOAWAY, HTTP/1.x works fine in all the above tests.
comment:5 by , 3 years ago
Hi.
Change from http2_max_requests to keepalive_requests is backward incompatible without config editing. It would be bigger problem for users in next stable release with default keepalive_requests=100.
comment:6 by , 3 years ago
Change from http2_max_requests to keepalive_requests is backward incompatible without config editing. It would be bigger problem for users in next stable release with default keepalive_requests=100.
It wasn't expected to cause any issues apart from a minor change in how often connections are reopened. We'll consider what to do with this, given the browsers inadequate GOAWAY handling. A possible solution might be to bump keepalive_requests
to some larger value by default, though this approach have its own drawbacks.
By the way, could you please provide some insights about your site? How many images per page does it have?
comment:8 by , 3 years ago
How can i privately give you site address? E-mail?
Feel free to, mdounin@….
comment:9 by , 3 years ago
For the record, here are some numbers about amount of resources per page (https://httparchive.org/reports/state-of-the-web):
date | client | p10 | p25 | p50 | p75 | p90 |
1 Dec 2020 | desktop | 25.0 | 44.0 | 73.0 | 115.0 | 174.0 |
1 Dec 2020 | mobile | 23.0 | 41.0 | 69.0 | 111.0 | 168.0 |
More details are at https://httparchive.org/reports/state-of-the-web?start=latest#reqTotal.
Assuming all resources are loaded from the same site and at the same time (the worst case), with current default keepalive_requests 100;
about 70% of sites should not be affected. With 200 more than 90% sites won't be affected, about 98% with 300, and 99% is reached at 400.
comment:11 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Thanks again for reporting this. A patch series was committed, which introduces the "keepalive_time" directive to limit total lifetime of a connection, and changes the "keepalive_requests" default value to 1000 (which matches previously used default for http2_max_requests).
These changes are expected to be enough to mitigate issues with GOAWAY handling in browsers for practical cases, while limiting possible resource impact from large "keepalive_requests" values.
Unfortunately, it is not possible to completely resolve the this on nginx side. Proper solution would be to fix GOAWAY handling in browsers.
Most likely, you are affected by this change in nginx 1.19.7:
Notably, change from
http2_max_requests
tokeepalive_requests
. It means that only 100 requests are allowed per HTTP/2 connection by default now, instead of previously used default 1000.I was able to reproduce the issue with a page trying to load 500 images. Though I see errors in the Network tab, not "Stale", and errors reported in console either as
net::ERR_FAILED
or asnet::ERR_HTTP2_SERVER_REFUSED_STREAM
. It is also easy to reproduce with nginx 1.19.6 withhttp2_max_requests 100;
or by using a page with 5000 images (and the defaulthttp2_max_requests 1000;
).It looks like Chrome in some cases cannot properly handle GOAWAY frames instructing it to redo requests in other connections, so this results in some resources not being loaded if the number of resources is significantly larger than
keepalive_requests
.Note that it seems to work in some cases, but not others. For example, I see the following request not being retried:
While some others are being properly retried:
I wasn't able to identify what causes this Chrome behaviour. Even limiting number of concurrent HTTP/2 streams to 1 via
http2_max_concurrent_streams 1;
does not help, some images still fail to load.An obvious workaround would be to use
keepalive_requests 1000;
for sites using many resources. Not sure what else we can do on nginx side. Overall, it looks like an issue in Chrome.