Opened 6 months ago

Closed 6 months ago

#2155 closed defect (fixed)

Stale

Reported by: iHeadRu@… Owned by:
Priority: critical Milestone:
Component: nginx-core Version: 1.19.x
Keywords: nginx stale requests Cc:
uname -a: FreeBSD freebsd12.build.ihead.ru 12.2-RELEASE-p3 FreeBSD 12.2-RELEASE-p3 GENERIC amd64
nginx -V: nginx version: nginx/1.19.8
built by clang 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)
built with OpenSSL 1.1.1k 25 Mar 2021
TLS SNI support enabled
configure arguments: --with-http_stub_status_module --with-http_flv_module --without-http_empty_gif_module --without-http_memcached_module --without-http_upstream_ip_hash_module --without-http_browser_module --with-http_ssl_module --without-http_uwsgi_module --without-http_scgi_module --with-openssl=../openssl-1.1.1k --with-http_v2_module --with-pcre-jit --with-http_auth_request_module --with-file-aio --with-http_realip_module

Description (last modified by iHeadRu@…)

Hi!

Problem appeared after upgrade from 1.19.6 to 1.19.8.
server (site) with https (http2).
Site with big number of images per page.
Some images are not loaded on control+F5.
Problem is reproduced in Google Chrome and other browsers based on Chromium.
F12 - Network tab -> Problem requests are marked red with status "Stale".

Problem not present in 1.19.6.

Attachments (1)

stalled.png (6.6 KB ) - added by iHeadRu@… 6 months ago.

Download all attachments as: .zip

Change History (12)

comment:1 by iHeadRu@…, 6 months ago

Description: modified (diff)

comment:2 by Maxim Dounin, 6 months ago

Most likely, you are affected by this change in nginx 1.19.7:

    *) Change: connections handling in HTTP/2 has been changed to better
       match HTTP/1.x; the "http2_recv_timeout", "http2_idle_timeout", and
       "http2_max_requests" directives have been removed, the
       "keepalive_timeout" and "keepalive_requests" directives should be
       used instead.

Notably, change from http2_max_requests to keepalive_requests. It means that only 100 requests are allowed per HTTP/2 connection by default now, instead of previously used default 1000.

I was able to reproduce the issue with a page trying to load 500 images. Though I see errors in the Network tab, not "Stale", and errors reported in console either as net::ERR_FAILED or as net::ERR_HTTP2_SERVER_REFUSED_STREAM. It is also easy to reproduce with nginx 1.19.6 with http2_max_requests 100; or by using a page with 5000 images (and the default http2_max_requests 1000;).

It looks like Chrome in some cases cannot properly handle GOAWAY frames instructing it to redo requests in other connections, so this results in some resources not being loaded if the number of resources is significantly larger than keepalive_requests.

Note that it seems to work in some cases, but not others. For example, I see the following request not being retried:

15297: URL_REQUEST
https://test.mdounin.ru:8443/test-many/nginx.png?108
Start Time: 2021-03-31 01:35:46.993

t= 276 [st=   0] +REQUEST_ALIVE  [dt=1409]
                  --> priority = "LOWEST"
                  --> traffic_annotation = 101845102
                  --> url = "https://test.mdounin.ru:8443/test-many/nginx.png?108"
t= 276 [st=   0]    NETWORK_DELEGATE_BEFORE_URL_REQUEST  [dt=0]
t= 276 [st=   0]   +URL_REQUEST_START_JOB  [dt=1409]
                    --> initiator = "https://test.mdounin.ru:8443"
                    --> load_flags = 16384 (SUPPORT_ASYNC_REVALIDATION)
                    --> method = "GET"
                    --> network_isolation_key = "https://mdounin.ru https://mdounin.ru"
                    --> privacy_mode = "disabled"
                    --> site_for_cookies = "SiteForCookies: {site=https://mdounin.ru; schemefully_same=true}"
                    --> url = "https://test.mdounin.ru:8443/test-many/nginx.png?108"
t= 276 [st=   0]      NETWORK_DELEGATE_BEFORE_START_TRANSACTION  [dt=0]
t= 277 [st=   1]      HTTP_CACHE_CALLER_REQUEST_HEADERS
                      --> sec-ch-ua: "Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"
                          If-None-Match: "60638dad-837"
                          If-Modified-Since: Tue, 30 Mar 2021 20:44:29 GMT
                          sec-ch-ua-mobile: ?0
                          User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36
                          Accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
                          Sec-Fetch-Site: same-origin
                          Sec-Fetch-Mode: no-cors
                          Sec-Fetch-Dest: image
                          Referer: https://test.mdounin.ru:8443/test-many/
                          Accept-Encoding: gzip, deflate, br
                          Accept-Language: en-US,en;q=0.9,ru;q=0.8
                      --> line = ""
t= 277 [st=   1]      HTTP_CACHE_GET_BACKEND  [dt=0]
t= 277 [st=   1]      HTTP_CACHE_OPEN_OR_CREATE_ENTRY  [dt=141]
t= 418 [st= 142]      HTTP_CACHE_ADD_TO_ENTRY  [dt=122]
t= 540 [st= 264]      HTTP_CACHE_READ_INFO  [dt=0]
t= 540 [st= 264]     +HTTP_STREAM_REQUEST  [dt=219]
t= 540 [st= 264]        HTTP_STREAM_JOB_CONTROLLER_BOUND
                        --> source_dependency = 15699 (HTTP_STREAM_JOB_CONTROLLER)
t= 759 [st= 483]        HTTP_STREAM_REQUEST_BOUND_TO_JOB
                        --> source_dependency = 15700 (HTTP_STREAM_JOB)
t= 759 [st= 483]     -HTTP_STREAM_REQUEST
t= 759 [st= 483]     +HTTP_TRANSACTION_SEND_REQUEST  [dt=279]
t= 759 [st= 483]        HTTP_TRANSACTION_HTTP2_SEND_REQUEST_HEADERS
                        --> :method: GET
                            :authority: test.mdounin.ru:8443
                            :scheme: https
                            :path: /test-many/nginx.png?108
                            sec-ch-ua: "Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"
                            if-none-match: "60638dad-837"
                            if-modified-since: Tue, 30 Mar 2021 20:44:29 GMT
                            sec-ch-ua-mobile: ?0
                            user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36
                            accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
                            sec-fetch-site: same-origin
                            sec-fetch-mode: no-cors
                            sec-fetch-dest: image
                            referer: https://test.mdounin.ru:8443/test-many/
                            accept-encoding: gzip, deflate, br
                            accept-language: en-US,en;q=0.9,ru;q=0.8
t= 974 [st= 698]        URL_REQUEST_SET_PRIORITY
                        --> priority = "MEDIUM"
t=1038 [st= 762]        HTTP2_STREAM_ERROR
                        --> description = "Abandoned."
                        --> net_error = "ERR_HTTP2_SERVER_REFUSED_STREAM"
                        --> stream_id = 217
t=1038 [st= 762]        HTTP_TRANSACTION_RESTART_AFTER_ERROR
                        --> net_error = -351 (ERR_HTTP2_SERVER_REFUSED_STREAM)
t=1038 [st= 762]     -HTTP_TRANSACTION_SEND_REQUEST
t=1038 [st= 762]     +HTTP_STREAM_REQUEST  [dt=315]
t=1038 [st= 762]        HTTP_STREAM_JOB_CONTROLLER_BOUND
                        --> source_dependency = 16483 (HTTP_STREAM_JOB_CONTROLLER)
t=1353 [st=1077]        HTTP_STREAM_REQUEST_BOUND_TO_JOB
                        --> source_dependency = 16484 (HTTP_STREAM_JOB)
t=1353 [st=1077]     -HTTP_STREAM_REQUEST
t=1685 [st=1409]   -URL_REQUEST_START_JOB
                    --> net_error = -2 (ERR_FAILED)
t=1685 [st=1409]    URL_REQUEST_DELEGATE_RESPONSE_STARTED  [dt=0]
t=1685 [st=1409] -REQUEST_ALIVE
                  --> net_error = -2 (ERR_FAILED)

While some others are being properly retried:

15281: URL_REQUEST
https://test.mdounin.ru:8443/test-many/nginx.png?100
Start Time: 2021-03-31 01:35:46.985

t= 268 [st=   0] +REQUEST_ALIVE  [dt=1248]
                  --> priority = "LOWEST"
                  --> traffic_annotation = 101845102
                  --> url = "https://test.mdounin.ru:8443/test-many/nginx.png?100"
t= 268 [st=   0]    NETWORK_DELEGATE_BEFORE_URL_REQUEST  [dt=0]
t= 268 [st=   0]   +URL_REQUEST_START_JOB  [dt=1248]
                    --> initiator = "https://test.mdounin.ru:8443"
                    --> load_flags = 16384 (SUPPORT_ASYNC_REVALIDATION)
                    --> method = "GET"
                    --> network_isolation_key = "https://mdounin.ru https://mdounin.ru"
                    --> privacy_mode = "disabled"
                    --> site_for_cookies = "SiteForCookies: {site=https://mdounin.ru; schemefully_same=true}"
                    --> url = "https://test.mdounin.ru:8443/test-many/nginx.png?100"
t= 268 [st=   0]      NETWORK_DELEGATE_BEFORE_START_TRANSACTION  [dt=0]
t= 268 [st=   0]      HTTP_CACHE_CALLER_REQUEST_HEADERS
                      --> sec-ch-ua: "Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"
                          If-None-Match: "60638dad-837"
                          If-Modified-Since: Tue, 30 Mar 2021 20:44:29 GMT
                          sec-ch-ua-mobile: ?0
                          User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36
                          Accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
                          Sec-Fetch-Site: same-origin
                          Sec-Fetch-Mode: no-cors
                          Sec-Fetch-Dest: image
                          Referer: https://test.mdounin.ru:8443/test-many/
                          Accept-Encoding: gzip, deflate, br
                          Accept-Language: en-US,en;q=0.9,ru;q=0.8
                      --> line = ""
t= 268 [st=   0]      HTTP_CACHE_GET_BACKEND  [dt=0]
t= 268 [st=   0]      HTTP_CACHE_OPEN_OR_CREATE_ENTRY  [dt=125]
t= 393 [st= 125]      HTTP_CACHE_ADD_TO_ENTRY  [dt=126]
t= 519 [st= 251]      HTTP_CACHE_READ_INFO  [dt=0]
t= 519 [st= 251]     +HTTP_STREAM_REQUEST  [dt=223]
t= 519 [st= 251]        HTTP_STREAM_JOB_CONTROLLER_BOUND
                        --> source_dependency = 15651 (HTTP_STREAM_JOB_CONTROLLER)
t= 742 [st= 474]        HTTP_STREAM_REQUEST_BOUND_TO_JOB
                        --> source_dependency = 15652 (HTTP_STREAM_JOB)
t= 742 [st= 474]     -HTTP_STREAM_REQUEST
t= 742 [st= 474]     +HTTP_TRANSACTION_SEND_REQUEST  [dt=290]
t= 742 [st= 474]        HTTP_TRANSACTION_HTTP2_SEND_REQUEST_HEADERS
                        --> :method: GET
                            :authority: test.mdounin.ru:8443
                            :scheme: https
                            :path: /test-many/nginx.png?100
                            sec-ch-ua: "Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"
                            if-none-match: "60638dad-837"
                            if-modified-since: Tue, 30 Mar 2021 20:44:29 GMT
                            sec-ch-ua-mobile: ?0
                            user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36
                            accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
                            sec-fetch-site: same-origin
                            sec-fetch-mode: no-cors
                            sec-fetch-dest: image
                            referer: https://test.mdounin.ru:8443/test-many/
                            accept-encoding: gzip, deflate, br
                            accept-language: en-US,en;q=0.9,ru;q=0.8
t= 983 [st= 715]        URL_REQUEST_SET_PRIORITY
                        --> priority = "MEDIUM"
t=1032 [st= 764]        HTTP2_STREAM_ERROR
                        --> description = "Abandoned."
                        --> net_error = "ERR_HTTP2_SERVER_REFUSED_STREAM"
                        --> stream_id = 201
t=1032 [st= 764]        HTTP_TRANSACTION_RESTART_AFTER_ERROR
                        --> net_error = -351 (ERR_HTTP2_SERVER_REFUSED_STREAM)
t=1032 [st= 764]     -HTTP_TRANSACTION_SEND_REQUEST
t=1032 [st= 764]     +HTTP_STREAM_REQUEST  [dt=271]
t=1032 [st= 764]        HTTP_STREAM_JOB_CONTROLLER_BOUND
                        --> source_dependency = 16465 (HTTP_STREAM_JOB_CONTROLLER)
t=1303 [st=1035]        HTTP_STREAM_REQUEST_BOUND_TO_JOB
                        --> source_dependency = 16466 (HTTP_STREAM_JOB)
t=1303 [st=1035]     -HTTP_STREAM_REQUEST
t=1303 [st=1035]     +HTTP_TRANSACTION_SEND_REQUEST  [dt=211]
t=1304 [st=1036]        HTTP_TRANSACTION_HTTP2_SEND_REQUEST_HEADERS
                        --> :method: GET
                            :authority: test.mdounin.ru:8443
                            :scheme: https
                            :path: /test-many/nginx.png?100
                            sec-ch-ua: "Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"
                            if-none-match: "60638dad-837"
                            if-modified-since: Tue, 30 Mar 2021 20:44:29 GMT
                            sec-ch-ua-mobile: ?0
                            user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36
                            accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
                            sec-fetch-site: same-origin
                            sec-fetch-mode: no-cors
                            sec-fetch-dest: image
                            referer: https://test.mdounin.ru:8443/test-many/
                            accept-encoding: gzip, deflate, br
                            accept-language: en-US,en;q=0.9,ru;q=0.8
t=1514 [st=1246]     -HTTP_TRANSACTION_SEND_REQUEST
t=1514 [st=1246]     +HTTP_TRANSACTION_READ_HEADERS  [dt=0]
t=1514 [st=1246]        HTTP_TRANSACTION_READ_RESPONSE_HEADERS
                        --> HTTP/1.1 304
                            server: nginx/1.19.9
                            date: Tue, 30 Mar 2021 22:35:48 GMT
                            last-modified: Tue, 30 Mar 2021 20:44:29 GMT
                            etag: "60638dad-837"
                            expires: Thu, 01 Jan 1970 00:00:01 GMT
                            cache-control: no-cache
t=1514 [st=1246]     -HTTP_TRANSACTION_READ_HEADERS
t=1514 [st=1246]      HTTP_CACHE_WRITE_INFO  [dt=2]
t=1516 [st=1248]      NETWORK_DELEGATE_HEADERS_RECEIVED  [dt=0]
t=1516 [st=1248]   -URL_REQUEST_START_JOB
t=1516 [st=1248]    URL_REQUEST_DELEGATE_RESPONSE_STARTED  [dt=0]
t=1516 [st=1248]    HTTP_TRANSACTION_READ_BODY  [dt=0]
t=1516 [st=1248] -REQUEST_ALIVE

I wasn't able to identify what causes this Chrome behaviour. Even limiting number of concurrent HTTP/2 streams to 1 via http2_max_concurrent_streams 1; does not help, some images still fail to load.

An obvious workaround would be to use keepalive_requests 1000; for sites using many resources. Not sure what else we can do on nginx side. Overall, it looks like an issue in Chrome.

comment:3 by iHeadRu@…, 6 months ago

Yes, problem is reproduced in 1.19.6 with

http2_max_requests 100;

And, yes, connection is "Stalled", not "Stale" (in "Timing" tab).

by iHeadRu@…, 6 months ago

Attachment: stalled.png added

comment:4 by Maxim Dounin, 6 months ago

Thanks for additional testing and for clarification. Indeed, in Timing tab only "Stalled" timing is shown for such failed requests. Yet these requests are shown as "(failed)" in the Status column of the table at the Network tab itself. Unfortunately, it doesn't help to understand why Chrome fails to properly re-request such resources as it should as per RFC 7540. This probably needs to be reported to Chrome.

Just for the record, previously the most common source of errors when hitting http2_max_requests was lack of the lingering close in nginx HTTP/2 code, which resulted in RST packets being sent (ticket #1250). This was fixed in nginx 1.19.1, and no longer an issue (to make sure I've double-checked tcpdump, there are no RST packets, only normal connection close with FINs from both sides).

The same test with keepalive_requests 100; and 500 images works fine with Firefox. On the other hand, Firefox seems to be limited by network.http.request.max-attempts (defaults to 10), and extreme configurations (where "keepalive_requests" is set so more than 10 HTTP/2 connections are needed to load all the resources) might cause Firefox to fail loading some resources as well. Further, any attempt to use an HTTP/2 connection to load a resource (even if nothing actually happens but the request is waiting due to max_concurrent_streams limit) seems to be counted towards the limit (tested with a page with 5k images, large enough keepalive_requests and a local patch to update SETTINGS_MAX_CONCURRENT_STREAMS after each request to make sure no extra requests need to be resent after GOAWAY when keepalive_requests limit is reached, as well as network.http.spdy.default-concurrent (defaults to 100) set to 1 in Firefox and http2_max_concurrent_streams 1; in nginx to do the same; some resources are not loaded unless keepalive_requests is 500 or more). This might actually be the same problem, though handled slightly better, with a limit of 10 "attempts" (not really, see above) by default instead of just two in Chrome.

As previously suggested, most simple workaround would be to use large enough keepalive_requests value if your site needs to load many resources. Another possible workaround would be to disable HTTP/2 till browsers are fixed to properly handle GOAWAY, HTTP/1.x works fine in all the above tests.

comment:5 by iHeadRu@…, 6 months ago

Hi.

Change from http2_max_requests to keepalive_requests is backward incompatible without config editing. It would be bigger problem for users in next stable release with default keepalive_requests=100.

comment:6 by Maxim Dounin, 6 months ago

Change from http2_max_requests to keepalive_requests is backward incompatible without config editing. It would be bigger problem for users in next stable release with default keepalive_requests=100.

It wasn't expected to cause any issues apart from a minor change in how often connections are reopened. We'll consider what to do with this, given the browsers inadequate GOAWAY handling. A possible solution might be to bump keepalive_requests to some larger value by default, though this approach have its own drawbacks.

By the way, could you please provide some insights about your site? How many images per page does it have?

comment:7 by iHeadRu@…, 6 months ago

How can i privately give you site address? E-mail?

in reply to:  7 comment:8 by Maxim Dounin, 6 months ago

How can i privately give you site address? E-mail?

Feel free to, mdounin@….

comment:9 by Maxim Dounin, 6 months ago

For the record, here are some numbers about amount of resources per page (https://httparchive.org/reports/state-of-the-web):

date client p10 p25 p50 p75 p90
1 Dec 2020 desktop 25.0 44.0 73.0 115.0 174.0
1 Dec 2020 mobile 23.0 41.0 69.0 111.0 168.0

More details are at https://httparchive.org/reports/state-of-the-web?start=latest#reqTotal.

Assuming all resources are loaded from the same site and at the same time (the worst case), with current default keepalive_requests 100; about 70% of sites should not be affected. With 200 more than 90% sites won't be affected, about 98% with 300, and 99% is reached at 400.

comment:10 by Maxim Dounin <mdounin@…>, 6 months ago

In 7822:82e174e47663/nginx:

Changed keepalive_requests default to 1000 (ticket #2155).

It turns out no browsers implement HTTP/2 GOAWAY handling properly, and
large enough number of resources on a page results in failures to load
some resources. In particular, Chrome seems to experience errors if
loading of all resources requires more than 1 connection (while it
is usually able to retry requests at least once, even with 2 connections
there are occasional failures for some reason), Safari if loading requires
more than 3 connections, and Firefox if loading requires more than 10
connections (can be configured with network.http.request.max-attempts,
defaults to 10).

It does not seem to be possible to resolve this on nginx side, even strict
limiting of maximum concurrency does not help, and loading issues seems to
be triggered by merely queueing of a request for a particular connection.
The only available mitigation seems to use higher keepalive_requests value.

The new default is 1000 and matches previously used default for
http2_max_requests. It is expected to be enough for 99.98% of the pages
(https://httparchive.org/reports/state-of-the-web?start=latest#reqTotal)
even in Chrome.

comment:11 by Maxim Dounin, 6 months ago

Resolution: fixed
Status: newclosed

Thanks again for reporting this. A patch series was committed, which introduces the "keepalive_time" directive to limit total lifetime of a connection, and changes the "keepalive_requests" default value to 1000 (which matches previously used default for http2_max_requests).

These changes are expected to be enough to mitigate issues with GOAWAY handling in browsers for practical cases, while limiting possible resource impact from large "keepalive_requests" values.

Unfortunately, it is not possible to completely resolve the this on nginx side. Proper solution would be to fix GOAWAY handling in browsers.

Note: See TracTickets for help on using tickets.