Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#2418 closed defect (fixed)

ngx_http_upstream_process_non_buffered_request do_write Dead-loop

Reported by: bullerdu@… Owned by:
Priority: minor Milestone:
Component: documentation Version: 1.23.x
Keywords: proxy_request_buffering off Cc: bullerdu@…
uname -a: Linux cdn-dev011164234021.na61 4.19.91-008.ali4000.alios7.x86_64 #1 SMP Fri Sep 4 17:33:26 CST 2020 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.23.2
built by gcc 6.5.1 20220324 (GCC)
configure arguments: --prefix=/home/yefei.dyf/nginx

Description (last modified by bullerdu@…)

In non-buffering transport mode(proxy_request_buffering off), I encountered a write dead loop problem with the following code path:

src/http/ngx_http_upstream.c:3666 ngx_http_upstream_process_non_buffered_request
  ->
     src/http/ngx_http_upstream.c:3691
     rc = ngx_http_output_filter(r, u->out_bufs);  /* u->out_bufs=0x0 */
         ->
         src/http/ngx_http_write_filter_module.c:48
         ngx_http_write_filter
            ->
            src/http/ngx_http_write_filter_module.c:295
            chain = c->send_chain(c, r->out, limit);
               src/os/uinx/ngx_linux_sendfile_chain.c:50
               ngx_linux_sendfile_chain
               /* code here return
                  if (!wev->ready) {
                      return in;
                  } 
               */

             src/http/ngx_http_write_filter_module.c:342
             /* code here return
                if (chain) {
                    c->buffered |= NGX_HTTP_WRITE_BUFFERED;
                    return NGX_AGAIN;
                }
              */

src/http/ngx_http_upstream.c:3735
n = upstream->recv(upstream, b->last, size);
/* code here, return n = -1 */

->

src/http/ngx_http_upstream.c:3753
/* code here, continue */

-> 

src/http/ngx_http_upstream.c:3691
rc = ngx_http_output_filter(r, u->out_bufs);

My analysis is as follows:

First:
when the downstream is written, ngx_linux_sendfile_chain calls ngx_writev and returns NGX_AGAIN, setting wev->ready =0
src/os/uinx/ngx_linux_sendfile_chain.c:204
if (n == NGX_AGAIN) {
wev->ready = 0;
return in;
}

Second:
src/http/ngx_http_upstream.c:3691
ngx_http_output_filter returns NGX_AGAIN, u->out_bufs = NULL, u->busy_bufs! = NULL

size > 0 && upstream->read->ready = 1, upstream->recv returns -1, So I keep writing downstream(ngx_http_output_filter), but downstream->write->ready = 0 so ngx_http_output_filter keeps returning -2

The resulting problems:
ngx_http_output_filter loop call, some filter_modules have cpu consumption problem, CPU 100%

The Solution:

Check the return value of ngx_http_output_filter. If it is NGX_AGAIN, break the write loop

src/http/ngx_http_upstream.c:3691

rc = ngx_http_output_filter(r, u->out_bufs);
if (rc == NGX_ERROR) {

ngx_http_upstream_finalize_request(r, u, NGX_ERROR);
return;

}
if (rc == NGX_AGAIN) {

...

break;

}

Change History (9)

comment:1 by Maxim Dounin, 2 years ago

The loop is expected to stop once the u->buffer is full. Could you please clarify the configuration you are seeing the problem with?

in reply to:  1 ; comment:2 by bullerdu@…, 2 years ago

Description: modified (diff)

Replying to Maxim Dounin:

The loop is expected to stop once the u->buffer is full. Could you please clarify the configuration you are seeing the problem with?

First of all, u->buffer is not full in my scenario. And n = upstream->recv(upstream, b->last, size); Always return -1.

Because the nginx production environment configuration is very complicated, it cannot be provided. And this phenomenon is not easy to repeat, only occasionally.

But I was sure that size = b->end - b->last; the size was not 0 at that time. I gdb checked the scene.

I still believe this problem is related to n = ngx_writev(c, &header); returning -2.

in reply to:  2 ; comment:3 by Maxim Dounin, 2 years ago

Replying to bullerdu@…:

Replying to Maxim Dounin:

The loop is expected to stop once the u->buffer is full. Could you please clarify the configuration you are seeing the problem with?

First of all, u->buffer is not full in my scenario. And n = upstream->recv(upstream, b->last, size); Always return -1.

So the upstream connection is closed with an error, and upstream->read->error is set, but nginx did not notice that due to u->busy_bufs is not empty, and tries to send the rest of the buffers to the client. Interesting.

But if upstream->recv() returns -1, the upstream->read->ready should be set to 0 (see ngx_unix_recv()), breaking the loop.

Is the nginx -V output you've provided in the ticket matches the actual nginx you are seeing the problem with, or it's just a placeholder? Looking at ngx_ssl_recv() I think the described behaviour might be possible with https upstream connections, but the nginx -V output you've provided suggests SSL is not used.

Also, are you using any 3rd party modules or patches?

in reply to:  3 comment:4 by bullerdu@…, 2 years ago

Replying to Maxim Dounin:

Replying to bullerdu@…:

Replying to Maxim Dounin:

The loop is expected to stop once the u->buffer is full. Could you please clarify the configuration you are seeing the problem with?

First of all, u->buffer is not full in my scenario. And n = upstream->recv(upstream, b->last, size); Always return -1.

So the upstream connection is closed with an error, and upstream->read->error is set, but nginx did not notice that due to u->busy_bufs is not empty, and tries to send the rest of the buffers to the client. Interesting.

But if upstream->recv() returns -1, the upstream->read->ready should be set to 0 (see ngx_unix_recv()), breaking the loop.

Is the nginx -V output you've provided in the ticket matches the actual nginx you are seeing the problem with, or it's just a placeholder? Looking at ngx_ssl_recv() I think the described behaviour might be possible with https upstream connections, but the nginx -V output you've provided suggests SSL is not used.

Also, are you using any 3rd party modules or patches?

Yes, My problem does occur with https upstream connections. upstream->recv() was ngx_ssl_recv().

I used the nginx version based on the official version 1.23 with some customized patches.
And the ngx_ssl_recv() is the same as version 1.23.

The code in question is as follows:
c->ssl->last was -1.

        switch (c->ssl->last) {

        case NGX_DONE:
            c->read->ready = 0;
            c->read->eof = 1;
            return 0;

        case NGX_ERROR:
            c->read->error = 1;

            /* fall through */

        case NGX_AGAIN:
            return c->ssl->last;
        }

And finally, this is a given problem, right? How to fix it? like this?

        switch (c->ssl->last) {

        case NGX_DONE:
            c->read->ready = 0;
            c->read->eof = 1;
            return 0;

        case NGX_ERROR:
            c->read->ready = 0;
            c->read->error = 1;

            /* fall through */

        case NGX_AGAIN:
            return c->ssl->last;
        }

comment:5 by Maxim Dounin, 2 years ago

Thanks for the details, I was able to reproduce this with an SSL error from the upstream server.

The following patch fixes the problem in my tests. Please test if it works for you:

# HG changeset patch
# User Maxim Dounin <mdounin@mdounin.ru>
# Date 1669621392 -10800
#      Mon Nov 28 10:43:12 2022 +0300
# Node ID 78a94133b50f4f7d632d3eb71ac389163a3309f3
# Parent  0b360747c74e3fa7e439e0684a8cf1da2d14d8f6
SSL: fixed ngx_ssl_recv() to reset c->read->ready after errors.

With this change, behaviour of ngx_ssl_recv() now matches ngx_unix_recv(),
which used to always reset c->read->ready to 0 when returning errors.

This fixes an infinite loop in unbuffered SSL proxying if writing to the
client is blocked and an SSL error happens (ticket #2418).

With this change, the fix for a similar issue in the stream module
(6868:ee3645078759), which used a different approach of explicitly
testing c->read->error instead, is no longer needed and was reverted.

diff --git a/src/event/ngx_event_openssl.c b/src/event/ngx_event_openssl.c
--- a/src/event/ngx_event_openssl.c
+++ b/src/event/ngx_event_openssl.c
@@ -2204,6 +2204,7 @@ ngx_ssl_recv(ngx_connection_t *c, u_char
 #endif
 
     if (c->ssl->last == NGX_ERROR) {
+        c->read->ready = 0;
         c->read->error = 1;
         return NGX_ERROR;
     }
@@ -2270,6 +2271,7 @@ ngx_ssl_recv(ngx_connection_t *c, u_char
 #if (NGX_HAVE_FIONREAD)
 
                     if (ngx_socket_nread(c->fd, &c->read->available) == -1) {
+                        c->read->ready = 0;
                         c->read->error = 1;
                         ngx_connection_error(c, ngx_socket_errno,
                                              ngx_socket_nread_n " failed");
@@ -2306,6 +2308,7 @@ ngx_ssl_recv(ngx_connection_t *c, u_char
             return 0;
 
         case NGX_ERROR:
+            c->read->ready = 0;
             c->read->error = 1;
 
             /* fall through */
@@ -2326,6 +2329,7 @@ ngx_ssl_recv_early(ngx_connection_t *c, 
     size_t     readbytes;
 
     if (c->ssl->last == NGX_ERROR) {
+        c->read->ready = 0;
         c->read->error = 1;
         return NGX_ERROR;
     }
@@ -2425,6 +2429,7 @@ ngx_ssl_recv_early(ngx_connection_t *c, 
             return 0;
 
         case NGX_ERROR:
+            c->read->ready = 0;
             c->read->error = 1;
 
             /* fall through */
diff --git a/src/stream/ngx_stream_proxy_module.c b/src/stream/ngx_stream_proxy_module.c
--- a/src/stream/ngx_stream_proxy_module.c
+++ b/src/stream/ngx_stream_proxy_module.c
@@ -1675,9 +1675,8 @@ ngx_stream_proxy_process(ngx_stream_sess
 
         size = b->end - b->last;
 
-        if (size && src->read->ready && !src->read->delayed
-            && !src->read->error)
-        {
+        if (size && src->read->ready && !src->read->delayed) {
+
             if (limit_rate) {
                 limit = (off_t) limit_rate * (ngx_time() - u->start_sec + 1)
                         - *received;

in reply to:  5 comment:6 by bullerdu@…, 2 years ago

Replying to Maxim Dounin:

Thanks for the details, I was able to reproduce this with an SSL error from the upstream server.

The following patch fixes the problem in my tests. Please test if it works for you:

# HG changeset patch
# User Maxim Dounin <mdounin@mdounin.ru>
# Date 1669621392 -10800
#      Mon Nov 28 10:43:12 2022 +0300
# Node ID 78a94133b50f4f7d632d3eb71ac389163a3309f3
# Parent  0b360747c74e3fa7e439e0684a8cf1da2d14d8f6
SSL: fixed ngx_ssl_recv() to reset c->read->ready after errors.

With this change, behaviour of ngx_ssl_recv() now matches ngx_unix_recv(),
which used to always reset c->read->ready to 0 when returning errors.

This fixes an infinite loop in unbuffered SSL proxying if writing to the
client is blocked and an SSL error happens (ticket #2418).

With this change, the fix for a similar issue in the stream module
(6868:ee3645078759), which used a different approach of explicitly
testing c->read->error instead, is no longer needed and was reverted.

diff --git a/src/event/ngx_event_openssl.c b/src/event/ngx_event_openssl.c
--- a/src/event/ngx_event_openssl.c
+++ b/src/event/ngx_event_openssl.c
@@ -2204,6 +2204,7 @@ ngx_ssl_recv(ngx_connection_t *c, u_char
 #endif
 
     if (c->ssl->last == NGX_ERROR) {
+        c->read->ready = 0;
         c->read->error = 1;
         return NGX_ERROR;
     }
@@ -2270,6 +2271,7 @@ ngx_ssl_recv(ngx_connection_t *c, u_char
 #if (NGX_HAVE_FIONREAD)
 
                     if (ngx_socket_nread(c->fd, &c->read->available) == -1) {
+                        c->read->ready = 0;
                         c->read->error = 1;
                         ngx_connection_error(c, ngx_socket_errno,
                                              ngx_socket_nread_n " failed");
@@ -2306,6 +2308,7 @@ ngx_ssl_recv(ngx_connection_t *c, u_char
             return 0;
 
         case NGX_ERROR:
+            c->read->ready = 0;
             c->read->error = 1;
 
             /* fall through */
@@ -2326,6 +2329,7 @@ ngx_ssl_recv_early(ngx_connection_t *c, 
     size_t     readbytes;
 
     if (c->ssl->last == NGX_ERROR) {
+        c->read->ready = 0;
         c->read->error = 1;
         return NGX_ERROR;
     }
@@ -2425,6 +2429,7 @@ ngx_ssl_recv_early(ngx_connection_t *c, 
             return 0;
 
         case NGX_ERROR:
+            c->read->ready = 0;
             c->read->error = 1;
 
             /* fall through */
diff --git a/src/stream/ngx_stream_proxy_module.c b/src/stream/ngx_stream_proxy_module.c
--- a/src/stream/ngx_stream_proxy_module.c
+++ b/src/stream/ngx_stream_proxy_module.c
@@ -1675,9 +1675,8 @@ ngx_stream_proxy_process(ngx_stream_sess
 
         size = b->end - b->last;
 
-        if (size && src->read->ready && !src->read->delayed
-            && !src->read->error)
-        {
+        if (size && src->read->ready && !src->read->delayed) {
+
             if (limit_rate) {
                 limit = (off_t) limit_rate * (ngx_time() - u->start_sec + 1)
                         - *received;

Thanks for your patch, it looks good to me.

comment:7 by Maxim Dounin, 2 years ago

Submitted for review to the nginx-devel@ mailing list:

https://mailman.nginx.org/pipermail/nginx-devel/2022-November/63QHXAIG6Y2VF7NRS4VHOZ2IWXQDJNF3.html

Thanks for testing.

Last edited 2 years ago by Maxim Dounin (previous) (diff)

comment:8 by Maxim Dounin <mdounin@…>, 2 years ago

In 8110:06c7d84cafdb/nginx:

SSL: fixed ngx_ssl_recv() to reset c->read->ready after errors.

With this change, behaviour of ngx_ssl_recv() now matches ngx_unix_recv(),
which used to always reset c->read->ready to 0 when returning errors.

This fixes an infinite loop in unbuffered SSL proxying if writing to the
client is blocked and an SSL error happens (ticket #2418).

With this change, the fix for a similar issue in the stream module
(6868:ee3645078759), which used a different approach of explicitly
testing c->read->error instead, is no longer needed and was reverted.

comment:9 by Maxim Dounin, 2 years ago

Resolution: fixed
Status: newclosed

Fix committed and will be available in the next release, thanks for reporting this.

Note: See TracTickets for help on using tickets.