Opened 2 years ago
Last modified 2 years ago
#2376 reopened defect
GRPC: upstream rejected request with error while reading response header from upstream
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | nginx-module | Version: | 1.19.x |
Keywords: | Cc: | ||
uname -a: | Linux nginx-grpc-ingress-ingress-nginx-nginx-grpc-controller-7b6pvg7j 5.4.188-104.359.amzn2.x86_64 #1 SMP Thu Apr 14 20:53:13 UTC 2022 x86_64 Linux | ||
nginx -V: |
nginx version: nginx/1.19.10
built by gcc 11.2.1 20220219 (Alpine 11.2.1_git20220219) built with OpenSSL 1.1.1q 5 Jul 2022 TLS SNI support enabled configure arguments: --prefix=/usr/local/nginx --conf-path=/etc/nginx/nginx.conf --modules-path=/etc/nginx/modules --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-compat --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_geoip_module --with-http_gzip_static_module --with-http_sub_module --with-http_v2_module --with-stream --with-stream_ssl_module --with-stream_realip_module --with-stream_ssl_preread_module --with-threads --with-http_secure_link_module --with-http_gunzip_module --with-file-aio --without-mail_pop3_module --without-mail_smtp_module --without-mail_imap_module --without-http_uwsgi_module --without-http_scgi_module --with-cc-opt='-g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wno-deprecated-declarations -fno-strict-aliasing -D_FORTIFY_SOURCE=2 --param=ssp-buffer-size=4 -DTCP_FASTOPEN=23 -fPIC -I/root/.hunter/_Base/d45d77d/d0c447a/3b7ee27/Install/include -Wno-cast-function-type -m64 -mtune=generic' --with-ld-opt='-fPIE -fPIC -pie -Wl,-z,relro -Wl,-z,now -L/root/.hunter/_Base/d45d77d/d0c447a/3b7ee27/Install/lib' --user=www-data --group=www-data --add-module=/tmp/build/ngx_devel_kit-0.3.1 --add-module=/tmp/build/set-misc-nginx-module-0.32 --add-module=/tmp/build/headers-more-nginx-module-0.33 --add-module=/tmp/build/ngx_http_substitutions_filter_module-b8a71eacc7f986ba091282ab8b1bbbc6ae1807e0 --add-module=/tmp/build/lua-nginx-module-b721656a9127255003b696b42ccc871c7ec18d59 --add-module=/tmp/build/stream-lua-nginx-module-74f8c8bca5b95cecbf42d4e1a465bc08cd075a9b --add-module=/tmp/build/lua-upstream-nginx-module-8aa93ead98ba2060d4efd594ae33a35d153589bf --add-module=/tmp/build/nginx_ajp_module-a964a0bcc6a9f2bfb82a13752d7794a36319ffac --add-dynamic-module=/tmp/build/nginx-http-auth-digest-1.0.0 --add-dynamic-module=/tmp/build/nginx-influxdb-module-5b09391cb7b9a889687c0aa67964c06a2d933e8b --add-dynamic-module=/tmp/build/nginx-opentracing-0.19.0/opentracing --add-dynamic-module=/tmp/build/ModSecurity-nginx-1.0.2 --add-dynamic-module=/tmp/build/ngx_http_geoip2_module-a26c6beed77e81553686852dceb6c7fdacc5970d --add-dynamic-module=/tmp/build/ngx_brotli |
Description (last modified by )
Hello,
Reporting an issue with GRPC where nginx is not correctly forwarding to client the response and RST_STREAM
when server early abort the http2 stream.
My setup:
[client] === grpc stream (with ssl) ===> [nginx] === grpc stream (cleartext) ===> [backend]
Description:
When the client is establishing a grpc (client/unidirectional) stream, it may happen that the backend close/reset the http2 stream before exhausting the client input stream, for example due to a timeout.
In this case, the backend send back the response and reset the stream:
grpc-gateway DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) } grpc-gateway DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(1) } grpc-gateway DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(1), flags: (0x5: END_HEADERS | END_STREAM) } grpc-gateway DEBUG Connection{peer=Server}: h2::codec::framed_write: send frame=Reset { stream_id: StreamId(1), error_code: CANCEL }
My issue is that the client never receive the response nor the http2 RST_STREAM from nginx.
The client discover that the stream have been reset only after the next client side http2 PING frame, or after the client try to send some new data.
DEBUG Connection{peer=Client}: h2::codec::framed_write: send frame=Ping { ack: false, payload: [59, 124, 219, 122, 11, 135, 22, 180] } DEBUG status_request{request_id=117db9f2-707c-4ccd-a824-f0f8d36c9fa3}:Connection{peer=Client}: h2::codec::framed_read: received frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) } DEBUG status_request{request_id=117db9f2-707c-4ccd-a824-f0f8d36c9fa3}:Connection{peer=Client}: h2::codec::framed_read: received frame=Reset { stream_id: StreamId(1), error_code: INTERNAL_ERROR } DEBUG hyper::proto::h2::client: client request body error: error writing a body to connection: send stream capacity unexpectedly closed
The only thing I see in the nginx log is those messages:
(error 8 is http2 CANCEL error code)
[error] 855#855: *73638 upstream rejected request with error 8 while reading response header from upstream, client: 10.0.33.7, server: grpc.qovery.com, request: "POST /agent.Agent/AgentResponsePublish HTTP/2.0", upstream: "grpc://10.0.20.240:8081", host: "grpc.qovery.com:443" 10.0.33.7 - - [10/Aug/2022:21:48:29 +0000] "POST /agent.Agent/AgentResponsePublish HTTP/2.0" 200 0 "-" "tonic/0.8.0" 566 60.024 [qovery-prod-grpc-gateway-grpc] [] 10.0.20.240:8081 68 60.025 200 e0ee79d5fd24a5bbf39556e159cb3840
If I remove nginx from my setup everything is working as expected.
The issue seems a duplicate of this one https://trac.nginx.org/nginx/ticket/1792, but as the ticket has been closed, I opened a new one with detailed information.
Attachments (3)
Change History (15)
comment:1 by , 2 years ago
Description: | modified (diff) |
---|
comment:2 by , 2 years ago
Description: | modified (diff) |
---|
comment:3 by , 2 years ago
comment:4 by , 2 years ago
Component: | documentation → nginx-module |
---|
comment:5 by , 2 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
gRPC seems to allow RST_STREAM due to framing errors, see specification. All such errors are interpreted by nginx as fatal errors on the particular connection.
In particular, when RST_STREAM is used on the connection between nginx and the upstream server before response headers are returned by the upstream, nginx will return 502 error to the client, as per HTTP specification.
I don't think we should do anything with this, as nginx behaviour looks perfectly correct. Probably a better way to cancel such requests in gRPC can be found, which is more compatible with HTTP protocol.
comment:6 by , 2 years ago
Resolution: | wontfix |
---|---|
Status: | closed → reopened |
Hello @Maxim,
I am re-opening the ticket because I think you may have read the issue too quickly or that I haven't explained it clearly enough.
For me, there are 2 issues/phenoms described here:
- The response is not forwarded to the client in presence of an RST_STREAM.
On this point, I think you responded that when receiving RST_STREAM nginx is in its right to abort the connection without propagating the response to the client
- The RST_STREAM is not forwarded by nginx to the client.
I think you missed this point, in my case nginx do not forward actively the RST_STREAM to the client after received it from upstream.
The client has to try to send some data on the connection, or wait for a ping frame in order to discover by itself that nginx received a RST_STREAM and the stream is no longer valid.
While I won't discuss for point 1. , I think for point 2. it is a bug of nginx to not (pro)actively forward the RST_STREAM to the client.
If no ping frame is used, like in my case, the client maintains the connection open/idle without any notification from nginx.
Is there a reason for nginx to not propagate the RST_STREAM to the client ?
comment:7 by , 2 years ago
- The response is not forwarded to the client in presence of an RST_STREAM.
Nothing in your report suggests there is a response. In contrast, the error message "upstream rejected request with error 8 while reading response header from upstream" suggests that RST_STREAM was seen by nginx before response headers were received.
- The RST_STREAM is not forwarded by nginx to the client.
The RST_STREAM corresponds to the particular connection between nginx and the backend. Once the error happens, nginx responds with 502 to the client. The 502 is then followed by RST_STREAM(NO_ERROR) if the request body is still being sent by the client, so this is what should happen in your case.
That is, while the particular RST_STREAM is not forwarded by nginx, it is believed that it shouldn't be forwarded, and another RST_STREAM is properly sent. If in doubt, check nginx debug log, it should have enough details on what happens on the wire.
comment:8 by , 2 years ago
I managed to reproduce the issue locally, so I have attached a pcap of what I found + colorized image of the capture to help you.
Regarding who is who:
upstream/grpc server is on 127.0.0.3:50051
nginx is on 127.0.0.2:80 (but talk from 127.0.0.1)
client is on 127.0.0.1
On the image,
the purple color is the tcp stream between client and nginx
the red color is the tcp stream between nginx and upstream (grpc server)
We can see that upstream/backend is responding to nginx with
44 0.006983607 127.0.0.3 127.0.0.1 GRPC 164 HEADERS[1], DATA[1] (GRPC), HEADERS[1], RST_STREAM[1]
but nginx is responding to the client only by sending
46 0.007045107 127.0.0.2 127.0.0.1 HTTP2 125 HEADERS[3]: 200 OK
it is only 60 seconds later, after that the client send a PING frame that nginx respond by a RST_STREAM, and not before
51 60.010391137 127.0.0.1 127.0.0.2 HTTP2 83 PING[0] 53 60.010608655 127.0.0.2 127.0.0.1 HTTP2 96 RST_STREAM[3], PING[0]
comment:9 by , 2 years ago
Thanks for the dump. It does not seem to match the original description, but probably explains the issue you are seeing. In the provided dump, the RST_STREAM from the upstream server follows gRPC response (headers + data + trailers), so the error message as logged by nginx should be ... while reading upstream
in this case.
From what I can see, there are two things which happen here:
- The RST_STREAM(8) following the response is interpreted by nginx as a fatal error, while upstream server properly use it as an attempt to let the server know that the stream is not needed. This seems to be something to address in the upstream server, as RFC 7540 quote as provided in the 2096b21fcd10 commit log suggests that the code should be RST_STREAM(NO_ERROR), and not RST_STREAM(CANCEL):
... A server can send a complete response prior to the client sending an entire request if the response does not depend on any portion of the request that has not been sent and received. When this is true, a server MAY request that the client abort transmission of a request without error by sending a RST_STREAM with an error code of NO_ERROR after sending a complete response (i.e., a frame with the END_STREAM flag). Clients MUST NOT discard responses as a result of receiving such a RST_STREAM, though clients can always discard responses at their discretion for other reasons.
It is not why in your case the upstream server uses RST_STREAM(CANCEL), this does not seem to be correct as per HTTP/2 specification and does not seem to be something used by common gRPC libraries.
- The error results in RST_STREAM(INTERNAL_ERROR) generated by nginx, but this RST_STREAM is not sent to the wire before writing to the connection is triggered for a different reason. This looks like something to address/improve in nginx.
It looks like there is another active stream (
/agent.Agent/AgentRequestSubscribe
), and due to thisngx_http_v2_close_stream()
does not try to initiate frame queue writing. Not sure why it behaves this way.
Hope this helps. Overall, this looks like an issue caused by improper use of RST_STREAM(CANCEL) instead of RST_STREAM(NO_ERROR) by the upstream server.
by , 2 years ago
comment:10 by , 2 years ago
Thank you for having taken the time to look at the capture.
- Thanks for the insight, I will try to see with the implementors if this behavior can be changed to be aligned with the RFC.
- If this can be of any help to you, I have attached the nginx-debug log of the issue.
You are right regarding the other active stream, our app/agent is a simple pub/sub. It subscribes to a stream of events and after create a new stream to respond to those events.
Regarding RST_STREAM is not sent to the wire before writing to the connection ... something to address/improve in nginx.
does this mean it qualify this ticket as a real issue to you ? What can be expected regarding this point ?
Thanks for your help
comment:11 by , 2 years ago
To keep the ticket updated, if some other people find it.
I contacted the maintainer of the library and provided a patch to fix the issue
https://github.com/hyperium/h2/issues/633, hopefully the patch will be available in version v0.3.13 of h2.
@Maxim Dounin, I confirm you that changing the RST_STREAM(CANCEL) by RST_STREAM(NO_ERROR) make nginx work flawlessly !
comment:12 by , 2 years ago
If this can be of any help to you, I have attached the nginx-debug log of the issue.
Thanks for the log. I was wrong assuming the log message should be ... while reading upstream
: since the RST_STREAM is received in the same read() call as the headers, the log message contains ... while reading response header from upstream
instead. Otherwise the log seems to match what I expected: nginx generates RST_STREAM, but does not send it immediately due to another stream being active.
Regarding
RST_STREAM is not sent to the wire before writing to the connection ... something to address/improve in nginx.
does this mean it qualify this ticket as a real issue to you ? What can be expected regarding this point ?
I'll take a look if it can be improved to send the RST_STREAM immediately. Keeping this ticket open for now to address this.
Thanks for reporting this, and thanks for the update. From the issue linked it looks like changing the code to NO_ERROR should also fix another related issue not involving nginx at all. Good to see this generally improves things.
Miss labeled the ticket as "documentation", if someone can change it for nginx-module