Opened 6 months ago

Last modified 6 months ago

#2376 reopened defect

GRPC: upstream rejected request with error while reading response header from upstream

Reported by: erebe@… Owned by:
Priority: minor Milestone:
Component: nginx-module Version: 1.19.x
Keywords: Cc:
uname -a: Linux nginx-grpc-ingress-ingress-nginx-nginx-grpc-controller-7b6pvg7j 5.4.188-104.359.amzn2.x86_64 #1 SMP Thu Apr 14 20:53:13 UTC 2022 x86_64 Linux
nginx -V: nginx version: nginx/1.19.10
built by gcc 11.2.1 20220219 (Alpine 11.2.1_git20220219)
built with OpenSSL 1.1.1q 5 Jul 2022
TLS SNI support enabled
configure arguments: --prefix=/usr/local/nginx --conf-path=/etc/nginx/nginx.conf --modules-path=/etc/nginx/modules --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-compat --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_geoip_module --with-http_gzip_static_module --with-http_sub_module --with-http_v2_module --with-stream --with-stream_ssl_module --with-stream_realip_module --with-stream_ssl_preread_module --with-threads --with-http_secure_link_module --with-http_gunzip_module --with-file-aio --without-mail_pop3_module --without-mail_smtp_module --without-mail_imap_module --without-http_uwsgi_module --without-http_scgi_module --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wno-deprecated-declarations -fno-strict-aliasing -D_FORTIFY_SOURCE=2 --param=ssp-buffer-size=4 -DTCP_FASTOPEN=23 -fPIC -I/root/.hunter/_Base/d45d77d/d0c447a/3b7ee27/Install/include -Wno-cast-function-type -m64 -mtune=generic' --with-ld-opt='-fPIE -fPIC -pie -Wl,-z,relro -Wl,-z,now -L/root/.hunter/_Base/d45d77d/d0c447a/3b7ee27/Install/lib' --user=www-data --group=www-data --add-module=/tmp/build/ngx_devel_kit-0.3.1 --add-module=/tmp/build/set-misc-nginx-module-0.32 --add-module=/tmp/build/headers-more-nginx-module-0.33 --add-module=/tmp/build/ngx_http_substitutions_filter_module-b8a71eacc7f986ba091282ab8b1bbbc6ae1807e0 --add-module=/tmp/build/lua-nginx-module-b721656a9127255003b696b42ccc871c7ec18d59 --add-module=/tmp/build/stream-lua-nginx-module-74f8c8bca5b95cecbf42d4e1a465bc08cd075a9b --add-module=/tmp/build/lua-upstream-nginx-module-8aa93ead98ba2060d4efd594ae33a35d153589bf --add-module=/tmp/build/nginx_ajp_module-a964a0bcc6a9f2bfb82a13752d7794a36319ffac --add-dynamic-module=/tmp/build/nginx-http-auth-digest-1.0.0 --add-dynamic-module=/tmp/build/nginx-influxdb-module-5b09391cb7b9a889687c0aa67964c06a2d933e8b --add-dynamic-module=/tmp/build/nginx-opentracing-0.19.0/opentracing --add-dynamic-module=/tmp/build/ModSecurity-nginx-1.0.2 --add-dynamic-module=/tmp/build/ngx_http_geoip2_module-a26c6beed77e81553686852dceb6c7fdacc5970d --add-dynamic-module=/tmp/build/ngx_brotli

Description (last modified by erebe@…)

Hello,

Reporting an issue with GRPC where nginx is not correctly forwarding to client the response and RST_STREAM
when server early abort the http2 stream.

My setup:

[client] === grpc stream (with ssl) ===> [nginx] === grpc stream (cleartext) ===> [backend]

Description:

When the client is establishing a grpc (client/unidirectional) stream, it may happen that the backend close/reset the http2 stream before exhausting the client input stream, for example due to a timeout.

In this case, the backend send back the response and reset the stream:

grpc-gateway DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
grpc-gateway DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(1) }
grpc-gateway DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(1), flags: (0x5: END_HEADERS | END_STREAM) }
grpc-gateway DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Reset { stream_id: StreamId(1), error_code: CANCEL }

My issue is that the client never receive the response nor the http2 RST_STREAM from nginx.
The client discover that the stream have been reset only after the next client side http2 PING frame, or after the client try to send some new data.

DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Ping { ack: false, payload: [59, 124, 219, 122, 11, 135, 22, 180] }
DEBUG status_request{request_id=117db9f2-707c-4ccd-a824-f0f8d36c9fa3}:Connection{peer=Client}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
DEBUG status_request{request_id=117db9f2-707c-4ccd-a824-f0f8d36c9fa3}:Connection{peer=Client}: h2::codec::framed_read: received frame=Reset { stream_id: StreamId(1), error_code: INTERNAL_ERROR }
DEBUG hyper::proto::h2::client: client request body error: error writing a body to connection: send stream capacity unexpectedly closed

The only thing I see in the nginx log is those messages:
(error 8 is http2 CANCEL error code)

[error] 855#855: *73638 upstream rejected request with error 8 while reading response header from upstream, client: 10.0.33.7, server: grpc.qovery.com, request: "POST /agent.Agent/AgentResponsePublish HTTP/2.0", upstream: "grpc://10.0.20.240:8081", host: "grpc.qovery.com:443"
10.0.33.7 - - [10/Aug/2022:21:48:29 +0000] "POST /agent.Agent/AgentResponsePublish HTTP/2.0" 200 0 "-" "tonic/0.8.0" 566 60.024 [qovery-prod-grpc-gateway-grpc] [] 10.0.20.240:8081 68 60.025 200 e0ee79d5fd24a5bbf39556e159cb3840  

If I remove nginx from my setup everything is working as expected.

The issue seems a duplicate of this one https://trac.nginx.org/nginx/ticket/1792, but as the ticket has been closed, I opened a new one with detailed information.

Attachments (3)

nginx.pcapng (8.1 KB ) - added by erebe@… 6 months ago.
Pcap of the issue
image.png (224.6 KB ) - added by erebe@… 6 months ago.
pcap_colorized
nginx.log (20.8 KB ) - added by erebe@… 6 months ago.

Download all attachments as: .zip

Change History (15)

comment:1 by erebe@…, 6 months ago

Description: modified (diff)

comment:2 by erebe@…, 6 months ago

Description: modified (diff)

comment:3 by erebe@…, 6 months ago

Miss labeled the ticket as "documentation", if someone can change it for nginx-module

comment:4 by Sergey Kandaurov, 6 months ago

Component: documentationnginx-module

comment:5 by Maxim Dounin, 6 months ago

Resolution: wontfix
Status: newclosed

gRPC seems to allow RST_STREAM due to framing errors, see specification. All such errors are interpreted by nginx as fatal errors on the particular connection.

In particular, when RST_STREAM is used on the connection between nginx and the upstream server before response headers are returned by the upstream, nginx will return 502 error to the client, as per HTTP specification.

I don't think we should do anything with this, as nginx behaviour looks perfectly correct. Probably a better way to cancel such requests in gRPC can be found, which is more compatible with HTTP protocol.

comment:6 by erebe@…, 6 months ago

Resolution: wontfix
Status: closedreopened

Hello @Maxim,

I am re-opening the ticket because I think you may have read the issue too quickly or that I haven't explained it clearly enough.

For me, there are 2 issues/phenoms described here:

  1. The response is not forwarded to the client in presence of an RST_STREAM.

On this point, I think you responded that when receiving RST_STREAM nginx is in its right to abort the connection without propagating the response to the client

  1. The RST_STREAM is not forwarded by nginx to the client.

I think you missed this point, in my case nginx do not forward actively the RST_STREAM to the client after received it from upstream.
The client has to try to send some data on the connection, or wait for a ping frame in order to discover by itself that nginx received a RST_STREAM and the stream is no longer valid.

While I won't discuss for point 1. , I think for point 2. it is a bug of nginx to not (pro)actively forward the RST_STREAM to the client.
If no ping frame is used, like in my case, the client maintains the connection open/idle without any notification from nginx.

Is there a reason for nginx to not propagate the RST_STREAM to the client ?

comment:7 by Maxim Dounin, 6 months ago

  1. The response is not forwarded to the client in presence of an RST_STREAM.

Nothing in your report suggests there is a response. In contrast, the error message "upstream rejected request with error 8 while reading response header from upstream" suggests that RST_STREAM was seen by nginx before response headers were received.

  1. The RST_STREAM is not forwarded by nginx to the client.

The RST_STREAM corresponds to the particular connection between nginx and the backend. Once the error happens, nginx responds with 502 to the client. The 502 is then followed by RST_STREAM(NO_ERROR) if the request body is still being sent by the client, so this is what should happen in your case.

That is, while the particular RST_STREAM is not forwarded by nginx, it is believed that it shouldn't be forwarded, and another RST_STREAM is properly sent. If in doubt, check nginx debug log, it should have enough details on what happens on the wire.

by erebe@…, 6 months ago

Attachment: nginx.pcapng added

Pcap of the issue

by erebe@…, 6 months ago

Attachment: image.png added

pcap_colorized

comment:8 by erebe@…, 6 months ago

I managed to reproduce the issue locally, so I have attached a pcap of what I found + colorized image of the capture to help you.

Regarding who is who:
upstream/grpc server is on 127.0.0.3:50051
nginx is on 127.0.0.2:80 (but talk from 127.0.0.1)
client is on 127.0.0.1

On the image,
the purple color is the tcp stream between client and nginx
the red color is the tcp stream between nginx and upstream (grpc server)

We can see that upstream/backend is responding to nginx with

44	0.006983607	127.0.0.3	127.0.0.1	GRPC	164	HEADERS[1], DATA[1] (GRPC), HEADERS[1], RST_STREAM[1]

but nginx is responding to the client only by sending

46	0.007045107	127.0.0.2	127.0.0.1	HTTP2	125	HEADERS[3]: 200 OK

it is only 60 seconds later, after that the client send a PING frame that nginx respond by a RST_STREAM, and not before

51	60.010391137	127.0.0.1	127.0.0.2	HTTP2	83	PING[0]
53	60.010608655	127.0.0.2	127.0.0.1	HTTP2	96	RST_STREAM[3], PING[0]

.

If I am not mistaken, the HEADERS[3]: 200 OK, that nginx respond to the client should contain the flag END_STREAM to notify the client that the stream is reset ?

HyperText Transfer Protocol 2
    Stream: HEADERS, Stream ID: 3, Length 50, 200 OK
        Length: 50
        Type: HEADERS (1)
        Flags: 0x04, End Headers
        0... .... .... .... .... .... .... .... = Reserved: 0x0
        .000 0000 0000 0000 0000 0000 0000 0011 = Stream Identifier: 3
        [Pad Length: 0]
        Header Block Fragment: 88768aaa6355e580ae17d7083f6196df3dbf4a042a436cca080212820dc082e09953168d…
        [Header Length: 122]
        [Header Count: 4]
        Header: :status: 200 OK
        Header: server: nginx/1.19.10
        Header: date: Thu, 11 Aug 2022 21:10:23 GMT
        Header: content-type: application/grpc

Last edited 6 months ago by erebe@… (previous) (diff)

comment:9 by Maxim Dounin, 6 months ago

Thanks for the dump. It does not seem to match the original description, but probably explains the issue you are seeing. In the provided dump, the RST_STREAM from the upstream server follows gRPC response (headers + data + trailers), so the error message as logged by nginx should be ... while reading upstream in this case.

From what I can see, there are two things which happen here:

  1. The RST_STREAM(8) following the response is interpreted by nginx as a fatal error, while upstream server properly use it as an attempt to let the server know that the stream is not needed. This seems to be something to address in the upstream server, as RFC 7540 quote as provided in the 2096b21fcd10 commit log suggests that the code should be RST_STREAM(NO_ERROR), and not RST_STREAM(CANCEL):
       ... A server can
       send a complete response prior to the client sending an entire
       request if the response does not depend on any portion of the request
       that has not been sent and received.  When this is true, a server MAY
       request that the client abort transmission of a request without error
       by sending a RST_STREAM with an error code of NO_ERROR after sending
       a complete response (i.e., a frame with the END_STREAM flag).
       Clients MUST NOT discard responses as a result of receiving such a
       RST_STREAM, though clients can always discard responses at their
       discretion for other reasons.
    
    It is not why in your case the upstream server uses RST_STREAM(CANCEL), this does not seem to be correct as per HTTP/2 specification and does not seem to be something used by common gRPC libraries.
  1. The error results in RST_STREAM(INTERNAL_ERROR) generated by nginx, but this RST_STREAM is not sent to the wire before writing to the connection is triggered for a different reason. This looks like something to address/improve in nginx.

It looks like there is another active stream (/agent.Agent/AgentRequestSubscribe), and due to this ngx_http_v2_close_stream() does not try to initiate frame queue writing. Not sure why it behaves this way.

Hope this helps. Overall, this looks like an issue caused by improper use of RST_STREAM(CANCEL) instead of RST_STREAM(NO_ERROR) by the upstream server.

by erebe@…, 6 months ago

Attachment: nginx.log added

comment:10 by erebe@…, 6 months ago

Thank you for having taken the time to look at the capture.

  1. Thanks for the insight, I will try to see with the implementors if this behavior can be changed to be aligned with the RFC.
  1. If this can be of any help to you, I have attached the nginx-debug log of the issue.

You are right regarding the other active stream, our app/agent is a simple pub/sub. It subscribes to a stream of events and after create a new stream to respond to those events.

Regarding RST_STREAM is not sent to the wire before writing to the connection ... something to address/improve in nginx. does this mean it qualify this ticket as a real issue to you ? What can be expected regarding this point ?

Thanks for your help

comment:11 by erebe@…, 6 months ago

To keep the ticket updated, if some other people find it.
I contacted the maintainer of the library and provided a patch to fix the issue
https://github.com/hyperium/h2/issues/633, hopefully the patch will be available in version v0.3.13 of h2.

@Maxim Dounin, I confirm you that changing the RST_STREAM(CANCEL) by RST_STREAM(NO_ERROR) make nginx work flawlessly !

comment:12 by Maxim Dounin, 6 months ago

If this can be of any help to you, I have attached the nginx-debug log of the issue.

Thanks for the log. I was wrong assuming the log message should be ... while reading upstream: since the RST_STREAM is received in the same read() call as the headers, the log message contains ... while reading response header from upstream instead. Otherwise the log seems to match what I expected: nginx generates RST_STREAM, but does not send it immediately due to another stream being active.

Regarding RST_STREAM is not sent to the wire before writing to the connection ... something to address/improve in nginx. does this mean it qualify this ticket as a real issue to you ? What can be expected regarding this point ?

I'll take a look if it can be improved to send the RST_STREAM immediately. Keeping this ticket open for now to address this.

Thanks for reporting this, and thanks for the update. From the issue linked it looks like changing the code to NO_ERROR should also fix another related issue not involving nginx at all. Good to see this generally improves things.

Note: See TracTickets for help on using tickets.