Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#2248 closed defect (fixed)

Stalled connection when using `aio_write on`

Reported by: aminvakil@… Owned by: Maxim Dounin
Priority: minor Milestone:
Component: nginx-core Version: 1.19.x
Keywords: aio aio_write proxy Cc: aminvakil@…
uname -a: Linux hostname 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.20.1
built by gcc 8.3.0 (Debian 8.3.0-6)
built with OpenSSL 1.1.1d 10 Sep 2019
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.20.1/debian/debuild-base/nginx-1.20.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'

Description (last modified by aminvakil@…)

Connection gets stalled most times and does not continue downloading files most of times, here is the configuration files:
/etc/nginx/nginx.conf

worker_rlimit_nofile 40000;

user  nginx;
worker_processes auto;

error_log  /var/log/nginx/error.log error;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
    use epoll;
}

http {
    server_tokens off;

    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    access_log  /var/log/nginx/access.log main buffer=2048k flush=10s;

    open_file_cache          max=10000 inactive=5m;
    open_file_cache_valid    2m;
    open_file_cache_min_uses 2;
    open_file_cache_errors   off;

    sendfile on;
    sendfile_max_chunk 512k;
    tcp_nopush  on;
    tcp_nodelay on;
    output_buffers 2 8k;
    keepalive_timeout  15;
    http2_idle_timeout 1m;

    # SSL
    ssl_session_timeout 8h;
    ssl_session_cache shared:SSL:100m;   # one megabyte can store about 4000 sessions
    ssl_session_tickets off;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;
    

    include /etc/nginx/conf.d/*.conf;
}

/etc/nginx/conf.d/example.org.conf

upstream cdns {
    server cdn1.example.org;
    keepalive 20;
}


server {
    listen 80 reuseport;
    listen 443 ssl http2 reuseport;
    server_name cache.example.org;

    ssl_certificate /etc/nginx/ssl/example.org.pem;
    ssl_certificate_key /etc/nginx/ssl/example.org.key;
        
    root /usr/share/nginx/html/;
    aio threads;
    aio_write on;

    location /apk-files/ {

        proxy_pass http://cdns;

    }
}

Then using http2 to connect to server gets stalled like this about 90% of times.

curl -LO "https://example.org/samplefile" -vvvvv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 1.1.1.1:443...
* Connected to example.org (1.1.1.1) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [19 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [4766 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=*.example.org
*  start date: Dec 14 11:59:52 2020 GMT
*  expire date: Dec 14 09:32:52 2021 GMT
*  subjectAltName: host "cache.example.org" matched cert's "*.example.org"
*  issuer: C=PL; O=Unizeto Technologies S.A.; OU=Certum Certification Authority; CN=Certum Domain Validation CA SHA2
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x801cbac360)
} [5 bytes data]
> GET /samplefile HTTP/2
> Host: example.org
> user-agent: curl/7.79.0
> accept: */*
> 
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [57 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [57 bytes data]
* old SSL session ID is stale, removing
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
} [5 bytes data]
< HTTP/2 200 
< server: nginx
< date: Mon, 20 Sep 2021 08:53:41 GMT
< content-type: application/octet-stream
< content-length: 40703269
< last-modified: Mon, 08 Feb 2021 08:13:22 GMT
< etag: "6020f2a2-26d1525"
< accept-ranges: bytes
< 
{ [7942 bytes data]
  0 38.8M    0 73478    0     0   5165      0  2:11:20  0:00:14  2:11:06     0

I couldn't reproduce the issue with curl --http1.1 or after commenting aio_write on.

Change History (8)

comment:1 by aminvakil@…, 3 years ago

Description: modified (diff)

comment:2 by Maxim Dounin, 3 years ago

Owner: set to Maxim Dounin
Status: newassigned

I was able to reproduce this, I'll take a look what goes on here.

comment:3 by Maxim Dounin, 3 years ago

As far as I can tell, the problem is as follows:

  • Some buffers from event pipe are not sent to the client due to EWOULDBLOCK. These buffers are queued for sending, and end up in various busy chains in various filters, notably ctx->busy in output chain (ngx_output_chain.c) and p->busy in even pipe (ngx_event_pipe.c).
  • After a write even, HTTP/2 connection sends these queued buffers and marks them as sent (b->pos = b->last). It posts a write event to the fake connections of particular streams to make these fake connection aware of the change.
  • Event pipe calls the output filter chain.
  • But output chain returns NGX_AGAIN immediately, as it is blocked in aio reading. It does not update chains, so event pipe buffers, which previously were in output chain's ctx->busy, are left there.
  • In contrast, event pipe updates chains, sees that the buffers were sent and moves them to p->free.

As a result, event pipe thinks a buffer is free and can be reused, while the same buffer is sitting in the output chain's ctx->busy. If the buffer is actually reused before ngx_chain_update_chains() is called by ngx_output_chain(), it is never released by ngx_chain_update_chains(), since it is not empty. Further, this prevents other busy buffers from being released as well, so all output buffers are eventually exhausted, and the connection hangs.

I don't see immediate reasons why this doesn't happen without aio_write, the root cause does not seem to depend on aio writing. Most likely not using aio_write makes it less likely to happen due to different timing and/or code path changes in event pipe code.

Quick fix might be to update ngx_chain_update_chains() to check tag before size, so buffers from other modules are immediately released. This approach, however, might not work if a module uses "if (ctx->busy) ..." to check for pending output data (and not only his own buffers). Looking through all ngx_chain_update_chains() uses in nginx code
suggests it should be safe though:

  • ngx_output_chain.c: better with the patch (short path will be used)
  • ngx_event_pipe.c: no difference (no other buffers)
  • ngx_http_gunzip_filter_module.c: no difference (no other buffers)
  • ngx_http_grpc_module.c: no difference (no other buffers, always shadow)
  • ngx_http_chunked_filter_module.c: no difference (does not use ctx->busy)
  • ngx_http_fastcgi_module.c: no difference (no other buffers, always shadow)
  • ngx_http_proxy_module.c: no difference (does not use ctx->busy)
  • ngx_http_gzip_filter_module.c: no difference (no other buffers)
  • ngx_http_v2.c: no difference (no other buffers)
  • ngx_http_request_body.c: no difference (no other buffers)
  • ngx_http_upstream.c: no difference (no other buffers)
  • ngx_stream_proxy_module.c: no difference (no other buffers)

Further, such checks are incorrect anyway, at least in generic code, since a module can skip input buffers and generate his own buffers instead: for example, this is what gzip filter does. That is, such checks are not really allowed and not expected to work, so the change in question should be safe.

A better solution might be to change HTTP/2 to only update buffers once the filter chain is called. This is much more complex approach though.

Below is a patch which implements ngx_chain_update_chains() changes:

# HG changeset patch
# User Maxim Dounin <mdounin@mdounin.ru>
# Date 1632973715 -10800
#      Thu Sep 30 06:48:35 2021 +0300
# Node ID 79733e98069590e732691fe5ff68d07feff95ec8
# Parent  bfad703459b4e2416548ac66f548e96c2197d9cc
Changed ngx_chain_update_chains() to test tag first (ticket #2248).

Without this change, aio used with HTTP/2 can result in connection hang,
as observed with "aio threads; aio_write on;" and proxying (ticket #2248).

The problem is that HTTP/2 updates buffers outside of the output filters
(notably, marks them as sent), and then posts a write event to call
output filters.  If a filter does not call the next one for some reason
(for example, because of an AIO operation in progress), this might
result in a state when the owner of a buffer already called
ngx_chain_update_chains() and can reuse the buffer, the same buffer
is still sitting in the busy chain of some other filter.

In the particular case a buffer was sitting in output chain's ctx->busy,
and was reused by even pipe.  Output chain's ctx->busy was permanently
blocked by it, and this resulted in connection hang.

Fix is to change ngx_chain_update_chains() to skip buffers from other
modules unconditionally, without trying to wait for these buffers to
become empty.

diff --git a/src/core/ngx_buf.c b/src/core/ngx_buf.c
--- a/src/core/ngx_buf.c
+++ b/src/core/ngx_buf.c
@@ -203,16 +203,16 @@ ngx_chain_update_chains(ngx_pool_t *p, n
     while (*busy) {
         cl = *busy;
 
-        if (ngx_buf_size(cl->buf) != 0) {
-            break;
-        }
-
         if (cl->buf->tag != tag) {
             *busy = cl->next;
             ngx_free_chain(p, cl);
             continue;
         }
 
+        if (ngx_buf_size(cl->buf) != 0) {
+            break;
+        }
+
         cl->buf->pos = cl->buf->start;
         cl->buf->last = cl->buf->start;
 

The patch fixes things in my tests. Please test if it works for you.

comment:4 by aminvakil@…, 3 years ago

Thanks for taking a look at this!

I can confirm that this patch on nginx 1.20.1 source from deb-src http://nginx.org/packages/debian buster nginx fixed my problem too.

comment:5 by Maxim Dounin, 3 years ago

Thanks for testing, submitted the patch for review.

comment:6 by Maxim Dounin <mdounin@…>, 2 years ago

In 7951:c7a8bdf5af55/nginx:

Changed ngx_chain_update_chains() to test tag first (ticket #2248).

Without this change, aio used with HTTP/2 can result in connection hang,
as observed with "aio threads; aio_write on;" and proxying (ticket #2248).

The problem is that HTTP/2 updates buffers outside of the output filters
(notably, marks them as sent), and then posts a write event to call
output filters. If a filter does not call the next one for some reason
(for example, because of an AIO operation in progress), this might
result in a state when the owner of a buffer already called
ngx_chain_update_chains() and can reuse the buffer, while the same buffer
is still sitting in the busy chain of some other filter.

In the particular case a buffer was sitting in output chain's ctx->busy,
and was reused by event pipe. Output chain's ctx->busy was permanently
blocked by it, and this resulted in connection hang.

Fix is to change ngx_chain_update_chains() to skip buffers from other
modules unconditionally, without trying to wait for these buffers to
become empty.

comment:7 by Maxim Dounin, 2 years ago

Resolution: fixed
Status: assignedclosed

Fix committed, thanks for reporting this.

comment:8 by Maxim Dounin <mdounin@…>, 2 years ago

In 7969:84c60a3cd12a/nginx:

Changed ngx_chain_update_chains() to test tag first (ticket #2248).

Without this change, aio used with HTTP/2 can result in connection hang,
as observed with "aio threads; aio_write on;" and proxying (ticket #2248).

The problem is that HTTP/2 updates buffers outside of the output filters
(notably, marks them as sent), and then posts a write event to call
output filters. If a filter does not call the next one for some reason
(for example, because of an AIO operation in progress), this might
result in a state when the owner of a buffer already called
ngx_chain_update_chains() and can reuse the buffer, while the same buffer
is still sitting in the busy chain of some other filter.

In the particular case a buffer was sitting in output chain's ctx->busy,
and was reused by event pipe. Output chain's ctx->busy was permanently
blocked by it, and this resulted in connection hang.

Fix is to change ngx_chain_update_chains() to skip buffers from other
modules unconditionally, without trying to wait for these buffers to
become empty.

Note: See TracTickets for help on using tickets.