Opened 6 weeks ago

Closed 5 weeks ago

#2229 closed defect (fixed)

Grpc Upstream timeout

Reported by: xTeare@… Owned by:
Priority: minor Milestone:
Component: documentation Version: 1.19.x
Keywords: grpc, c#, java, upstream, timeout Cc:
uname -a: Windows 10 Pro
nginx -V: nginx version: nginx/1.21.1
built by cl 16.00.40219.01 for 80x86
built with OpenSSL 1.1.1k 25 Mar 2021
TLS SNI support enabled
configure arguments: --with-cc=cl --builddir=objs.msvc8 --with-debug --prefix= --conf-path=conf/nginx.conf --pid-path=logs/nginx.pid --http-log-path=logs/access.log --error-log-path=logs/error.log --sbin-path=nginx.exe --http-client-body-temp-path=temp/client_body_temp --http-proxy-temp-path=temp/proxy_temp --http-fastcgi-temp-path=temp/fastcgi_temp --http-scgi-temp-path=temp/scgi_temp --http-uwsgi-temp-path=temp/uwsgi_temp --with-cc-opt=-DFD_SETSIZE=1024 --with-pcre=objs.msvc8/lib/pcre-8.44 --with-zlib=objs.msvc8/lib/zlib-1.2.11 --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_stub_status_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_auth_request_module --with-http_random_index_module --with-http_secure_link_module --with-http_slice_module --with-mail --with-stream --with-openssl=objs.msvc8/lib/openssl-1.1.1k --with-openssl-opt='no-asm no-tests -D_WIN32_WINNT=0x0501' --with-http_ssl_module --with-mail_ssl_module --with-stream_ssl_module

Description (last modified by xTeare@…)

So, i've got these things : a client, a server, nginx. They all run on the same machine.

The client does some network scanning and is supposed to send it's results to the server.
The server is supposed to take the scanning-results and put them into a database.
The server is based on ASP.Net.

The client and server use the following packages

Both also use the SSL certificate and generally they both work as expected.

The nginx proxy is supposed to be used as a Reverse-Proxy -> Clients only need to know one Endpoint for all their Requests. Nginx is only configured to pass grpc requests, nothing else.

First of all, almost all grpc requests do work. But they stop working, if the request gets too big. I dont know the exact size, but it must be under 180kb.

(180kb is the size of the scan results.In a test, sending half of these results succeded, where sending all did not. Custom test data (with the same class of course) did show the same results. If they got too big, it stopped working)

Without nginx, there are no errors while sending data from client to server. With nginx however, the client does recieve a HTTP 504 error.

Content of error.log

2021/07/30 13:08:05 [error] 23372#21368: *44 upstream timed out (10060: Ein Verbindungsversuch ist fehlgeschlagen, da die Gegenstelle nach einer bestimmten Zeitspanne nicht richtig reagiert hat, oder die hergestellte Verbindung war fehlerhaft, da der verbundene Host nicht reagiert hat) while sending request to upstream, client: 127.0.0.1, server: , request: "POST /greet.Greeter/SayHello HTTP/2.0", upstream: "grpcs://127.0.0.1:5001", host: "localhost:5002"

The repository does only contain a simplified version, but with the same end results.
Please have a look at my sample project, wich should include everything you'd need.

Debug logs

Do you guy's have heard of this behaviour / know how to fix this ?

Other Resources

Thanks in advance
xTeare

Change History (7)

comment:1 by xTeare@…, 6 weeks ago

Description: modified (diff)

comment:2 by Sergey Kandaurov, 6 weeks ago

Status: newaccepted

Looks like there is a bug that causes removing read event after successful SSL handshake on the upstream connection. From the non-working request:

2021/08/09 09:00:18 [debug] 18872#20176: *1 get rr peer, try: 2
2021/08/09 09:00:18 [debug] 18872#20176: *1 get rr peer, current: 02BF9F9C -1
2021/08/09 09:00:18 [debug] 18872#20176: *1 stream socket 352
2021/08/09 09:00:18 [debug] 18872#20176: *1 connect to [::1]:5001, fd:352 #2
2021/08/09 09:00:18 [debug] 18872#20176: *1 select add event fd:352 ev:768
2021/08/09 09:00:18 [debug] 18872#20176: *1 select add event fd:352 ev:16
2021/08/09 09:00:18 [debug] 18872#20176: *1 http upstream connect: -2
...
2021/08/09 09:00:18 [debug] 18872#20176: *1 SSL_do_handshake: -1
2021/08/09 09:00:18 [debug] 18872#20176: *1 SSL_get_error: 2
2021/08/09 09:00:18 [debug] 18872#20176: *1 select del event fd:352 ev:16
2021/08/09 09:00:18 [debug] 18872#20176: *1 post event 02C23270
2021/08/09 09:00:18 [debug] 18872#20176: *1 delete posted event 02C23270
2021/08/09 09:00:18 [debug] 18872#20176: *1 SSL handshake handler: 0
2021/08/09 09:00:18 [debug] 18872#20176: *1 SSL_do_handshake: -1
2021/08/09 09:00:18 [debug] 18872#20176: *1 SSL_get_error: 2
2021/08/09 09:00:18 [debug] 18872#20176: *1 post event 02C23270
2021/08/09 09:00:18 [debug] 18872#20176: *1 delete posted event 02C23270
2021/08/09 09:00:18 [debug] 18872#20176: *1 SSL handshake handler: 0
2021/08/09 09:00:18 [debug] 18872#20176: *1 save session: 02C50338
2021/08/09 09:00:18 [debug] 18872#20176: *1 SSL_do_handshake: 1
2021/08/09 09:00:18 [debug] 18872#20176: *1 select del event fd:352 ev:768

comment:3 by xTeare@…, 5 weeks ago

Do you have a rough estimate when this will get fixed ? Our application is quite dependent on nginx :)

Thanks in advance !

comment:4 by Maxim Dounin, 5 weeks ago

I'm able to reproduce it with slightly modified grpc.t, quick-and-dirty patch to reproduce:

diff --git a/grpc.t b/grpc.t
--- a/grpc.t
+++ b/grpc.t
@@ -33,6 +33,7 @@ my $t = Test::Nginx->new()->has(qw/http 
 daemon off;
 
 events {
+    use select;
 }
 
 http {
@@ -62,6 +63,10 @@ http {
             }
         }
 
+        location /FlowControl {
+            grpc_pass grpcs://127.0.0.1:8082;
+        }
+
         location /KeepAlive {
             grpc_pass u;
         }
@@ -86,10 +91,41 @@ http {
             set $args $arg_c;
         }
     }
+
+    server {
+        listen       127.0.0.1:8082 http2 ssl;
+        server_name  localhost;
+
+        ssl_certificate localhost.crt;
+        ssl_certificate_key localhost.key;
+
+        location / {
+            grpc_pass grpc://127.0.0.1:8081;
+        }
+    }
 }
 
 EOF
 
+$t->write_file('openssl.conf', <<EOF);
+[ req ]
+default_bits = 2048
+encrypt_key = no
+distinguished_name = req_distinguished_name
+[ req_distinguished_name ]
+EOF
+
+my $d = $t->testdir();
+
+foreach my $name ('localhost') {
+	system('openssl req -x509 -new '
+		. "-config $d/openssl.conf -subj /CN=$name/ "
+		. "-out $d/$name.crt -keyout $d/$name.key "
+		. ">>$d/openssl.out 2>&1") == 0
+		or die "Can't create certificate for $name: $!\n";
+}
+
+
 $t->run();
 
 ###############################################################################

Below is a patch to address this. Testing is appreciated.

# HG changeset patch
# User Maxim Dounin <mdounin@mdounin.ru>
# Date 1629154008 -10800
#      Tue Aug 17 01:46:48 2021 +0300
# Node ID f6582b24d622998db1a8587fc0893a1e883c0a6b
# Parent  419c066cb7103165fe008339d210037f68a72d4f
Upstream: fixed timeouts with gRPC, SSL and select (ticket #2229).

With SSL it is possible that an established connection is ready for
reading after the handshake.  Further, events might be already disabled
in case of level-triggered event methods.  If this happens and
ngx_http_upstream_send_request() blocks waiting for some data from
the upstream, such as for flow control in case of gRPC, the connection
will time out due to no read events on the upstream connection.

Fix is explicitly check for c->read->ready if ngx_http_upstream_send_request()
blocks and post a read event if it is set.

Note that while it is possible to modify ngx_ssl_handshake() to keep
read events active, this won't completely resolve the issue, since
there can be data already received during the SSL handshake.

diff --git a/src/http/ngx_http_upstream.c b/src/http/ngx_http_upstream.c
--- a/src/http/ngx_http_upstream.c
+++ b/src/http/ngx_http_upstream.c
@@ -2113,6 +2113,10 @@ ngx_http_upstream_send_request(ngx_http_
             c->tcp_nopush = NGX_TCP_NOPUSH_UNSET;
         }
 
+        if (c->read->ready) {
+            ngx_post_event(c->read, &ngx_posted_events);
+        }
+
         return;
     }
 

comment:5 by xTeare@…, 5 weeks ago

I was indeed able to fix the problem with your "quick-fix". I've tested it with the same Repository
(I've sent 10k generated Strings, beforehand only 500 worked)

nginx.conf          : https://pastebin.com/H6qpMTNY
debug log           : https://pastebin.com/vL9JwKBx
build configuration : https://pastebin.com/KiGF9t9s
Last edited 5 weeks ago by xTeare@… (previous) (diff)

comment:6 by Maxim Dounin <mdounin@…>, 5 weeks ago

In 7906:058a67435e83/nginx:

Upstream: fixed timeouts with gRPC, SSL and select (ticket #2229).

With SSL it is possible that an established connection is ready for
reading after the handshake. Further, events might be already disabled
in case of level-triggered event methods. If this happens and
ngx_http_upstream_send_request() blocks waiting for some data from
the upstream, such as flow control in case of gRPC, the connection
will time out due to no read events on the upstream connection.

Fix is to explicitly check the c->read->ready flag if sending request
blocks and post a read event if it is set.

Note that while it is possible to modify ngx_ssl_handshake() to keep
read events active, this won't completely resolve the issue, since
there can be data already received during the SSL handshake
(see 573bd30e46b4).

comment:7 by Maxim Dounin, 5 weeks ago

Resolution: fixed
Status: acceptedclosed

Fix committed, thanks for reporting this.

Note: See TracTickets for help on using tickets.