Opened 3 years ago

Closed 3 years ago

#2211 closed defect (duplicate)

Sendfile stalls on files larger that 2GB

Reported by: Miao Wang Owned by:
Priority: critical Milestone:
Component: nginx-core Version: 1.18.x
Keywords: Cc:
uname -a: Linux localhost 5.10.0-7-arm64 #1 SMP Debian 5.10.40-1 (2021-05-28) aarch64 GNU/Linux
nginx -V: nginx version: nginx/1.18.0
built with OpenSSL 1.1.1k 25 Mar 2021
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -ffile-prefix-map=/build/nginx-UdHF0z/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-compat --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_sub_module

Description

Hi all,

I recently discovered a bug when a http request for a static file stalled at ~2GB. By analyzing the code, I can confirm that the bug affects a very broad versions of nginx till the latest. The minimum configuration to reproduce this bug is as follows:

user www-data;
worker_processes 1;
pid nginx.pid;

error_log error.log debug;
#error_log error.log;


events {
        worker_connections 768;
        # multi_accept on;
}

http {

sendfile on;
tcp_nopush on;

error_log error.log debug; 
access_log access.log; 

### commenting this for bug reproduce
#sendfile_max_chunk 2147483647;
 
server { 
  listen 127.0.0.1:8080; 
  location / { 
    root web/; 
  } 
} 
} 

and generate a file as large as 4GB in the web/ directory:

dd if=/dev/urandom bs=4M count=1024 of=web/4gb.bin status=progress

By issuing the following request:

curl -v "http://127.0.0.1:8080/4gb.bin" > /dev/null

It can be seen that the download process stalls at 2047M:

*   Trying 127.0.0.1:8080...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /4gb.bin HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.74.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.18.0
< Date: Thu, 24 Jun 2021 10:41:12 GMT
< Content-Type: text/plain
< Content-Length: 4294967296
< Last-Modified: Thu, 24 Jun 2021 10:40:50 GMT
< Connection: keep-alive
< ETag: "60d46132-100000000"
< Accept-Ranges: bytes
< 
{ [32516 bytes data]
 49 4096M   49 2047M    0     0  86.4M      0  0:00:47  0:00:23  0:00:24     0

After 60s, the connection is closed:

transfer closed with 2147487744 bytes remaining to read
* Closing connection 0
curl: (18) transfer closed with 2147487744 bytes remaining to read

As seen in the debug log, the sendfile() syscall runs smoothly without receiving -EAGAIN till 2GB-1 bytes (NGX_SENDFILE_MAXSIZE) (that's why this is not easy to reproduce).

2021/06/24 18:41:12 [debug] 1833674#1833674: *1 accept: 127.0.0.1:45770 fd:3
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 event timer add: 3: 60000:1307018747
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 reusable connection: 1
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 epoll add event: fd:3 op:1 ev:80002001
2021/06/24 18:41:12 [debug] 1833674#1833674: timer delta: 75793
2021/06/24 18:41:12 [debug] 1833674#1833674: worker cycle
2021/06/24 18:41:12 [debug] 1833674#1833674: epoll timer: 60000
2021/06/24 18:41:12 [debug] 1833674#1833674: epoll: fd:3 ev:0001 d:0000FFFF9F2761E0
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http wait request handler
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 malloc: 0000AAAAF8FE5200:1024
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 recv: eof:0, avail:-1
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 recv: fd:3 85 of 1024
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 reusable connection: 0
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 posix_memalign: 0000AAAAF900F420:4096 @16
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http process request line
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http request line: "GET /4gb.bin HTTP/1.1"
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http uri: "/4gb.bin"
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http args: ""
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http exten: "bin"
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 posix_memalign: 0000AAAAF9005350:4096 @16
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http process request header line
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http header: "Host: 127.0.0.1:8080"
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http header: "User-Agent: curl/7.74.0"
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http header: "Accept: */*"
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http header done
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 event timer del: 3: 1307018747
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 generic phase: 0
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 rewrite phase: 1
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 test location: "/"
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 using configuration "/"
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http cl:-1 max:1048576
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 rewrite phase: 3
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 post rewrite phase: 4
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 generic phase: 5
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 generic phase: 6
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 generic phase: 7
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 access phase: 8
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 access phase: 9
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 access phase: 10
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 post access phase: 11
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 generic phase: 12
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 generic phase: 13
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 content phase: 14
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 content phase: 15
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 content phase: 16
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 content phase: 17
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 content phase: 18
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http filename: "/home/shanker/nginx-debug/web/4gb.bin"
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 add cleanup: 0000AAAAF9010190
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http static fd: 10
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http set discard body
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 HTTP/1.1 200 OK
Server: nginx/1.18.0
Date: Thu, 24 Jun 2021 10:41:12 GMT
Content-Type: text/plain
Content-Length: 4294967296
Last-Modified: Thu, 24 Jun 2021 10:40:50 GMT
Connection: keep-alive
ETag: "60d46132-100000000"
Accept-Ranges: bytes

2021/06/24 18:41:12 [debug] 1833674#1833674: *1 write new buf t:1 f:0 0000AAAAF9005730, pos 0000AAAAF9005730, size: 252 file: 0, size: 0
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http write filter: l:0 f:0 s:252
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http output filter "/4gb.bin?"
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http copy filter: "/4gb.bin?"
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http postpone filter "/4gb.bin?" 0000FFFFC6A9D4F0
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 write old buf t:1 f:0 0000AAAAF9005730, pos 0000AAAAF9005730, size: 252 file: 0, size: 0
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 write new buf t:0 f:1 0000000000000000, pos 0000000000000000, size: 0 file: 0, size: 4294967296
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http write filter: l:1 f:0 s:4294967548
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http write filter limit 0
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 tcp_nopush
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 writev: 252 of 252
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 sendfile: @0 2147479552
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 sendfile: 2147479552 of 2147479552 @0
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http write filter 0000AAAAF9005840
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http copy filter: -2 "/4gb.bin?"
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 http finalize request: -2, "/4gb.bin?" a:1, c:1
2021/06/24 18:41:12 [debug] 1833674#1833674: *1 event timer add: 3: 60000:1307018747
2021/06/24 18:41:12 [debug] 1833674#1833674: timer delta: 0
2021/06/24 18:41:12 [debug] 1833674#1833674: worker cycle
2021/06/24 18:41:12 [debug] 1833674#1833674: epoll timer: 60000
2021/06/24 18:42:15 [debug] 1833674#1833674: timer delta: 62461
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 event timer del: 3: 1307018747
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 http run request: "/4gb.bin?"
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 http writer handler: "/4gb.bin?"
2021/06/24 18:42:15 [info] 1833674#1833674: *1 client timed out (110: Connection timed out) while sending response to client, client: 127.0.0.1, server: , request: "GET /4gb.bin HTTP/1.1", host: "127.0.0.1:8080"
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 http finalize request: 408, "/4gb.bin?" a:1, c:1
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 http terminate request count:1
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 http terminate cleanup count:1 blk:0
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 http posted request: "/4gb.bin?"
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 http terminate handler count:1
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 http request count:1 blk:0
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 http close request
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 http log handler
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 run cleanup: 0000AAAAF9010190
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 file cleanup: fd:10
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 free: 0000AAAAF900F420, unused: 8
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 free: 0000AAAAF9005350, unused: 2689
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 close http connection: 3
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 reusable connection: 0
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 free: 0000AAAAF8FE5200
2021/06/24 18:42:15 [debug] 1833674#1833674: *1 free: 0000AAAAF8FFF0C0, unused: 128
2021/06/24 18:42:15 [debug] 1833674#1833674: worker cycle
2021/06/24 18:42:15 [debug] 1833674#1833674: epoll timer: -1

It is caused by the code in ngx_http_write_filter http://lxr.nginx.org/source/xref/nginx/src/http/ngx_http_write_filter_module.c?r=7504%3Ac19ca381b2e6#330, where it failed to add a timer to the write event when limit is zero and c->write->ready is 1, resulting a stall connection.

To work around this issue, adding sendfile_max_chunk 2147483647; works. After added, the connection won't stall and the corresponding debug log is:

2021/06/24 18:55:29 [debug] 1870676#1870676: accept on 127.0.0.1:8080, ready: 0
2021/06/24 18:55:29 [debug] 1870676#1870676: posix_memalign: 0000AAAAF8FFF0C0:512 @16
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 accept: 127.0.0.1:49334 fd:3
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 event timer add: 3: 60000:1307875216
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 reusable connection: 1
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 epoll add event: fd:3 op:1 ev:80002001
2021/06/24 18:55:29 [debug] 1870676#1870676: timer delta: 11236
2021/06/24 18:55:29 [debug] 1870676#1870676: worker cycle
2021/06/24 18:55:29 [debug] 1870676#1870676: epoll timer: 60000
2021/06/24 18:55:29 [debug] 1870676#1870676: epoll: fd:3 ev:0001 d:0000FFFF9F2761E0
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http wait request handler
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 malloc: 0000AAAAF8FE5200:1024
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 recv: eof:0, avail:-1
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 recv: fd:3 85 of 1024
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 reusable connection: 0
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 posix_memalign: 0000AAAAF901B370:4096 @16
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http process request line
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http request line: "GET /4gb.bin HTTP/1.1"
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http uri: "/4gb.bin"
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http args: ""
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http exten: "bin"
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 posix_memalign: 0000AAAAF9001340:4096 @16
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http process request header line
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http header: "Host: 127.0.0.1:8080"
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http header: "User-Agent: curl/7.74.0"
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http header: "Accept: */*"
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http header done
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 event timer del: 3: 1307875216
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 generic phase: 0
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 rewrite phase: 1
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 test location: "/"
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 using configuration "/"
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http cl:-1 max:1048576
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 rewrite phase: 3
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 post rewrite phase: 4
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 generic phase: 5
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 generic phase: 6
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 generic phase: 7
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 access phase: 8
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 access phase: 9
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 access phase: 10
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 post access phase: 11
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 generic phase: 12
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 generic phase: 13
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 content phase: 14
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 content phase: 15
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 content phase: 16
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 content phase: 17
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 content phase: 18
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http filename: "/home/shanker/nginx-debug/web/4gb.bin"
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 add cleanup: 0000AAAAF901C0E0
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http static fd: 4
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http set discard body
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 HTTP/1.1 200 OK
Server: nginx/1.18.0
Date: Thu, 24 Jun 2021 10:55:29 GMT
Content-Type: text/plain
Content-Length: 4294967296
Last-Modified: Thu, 24 Jun 2021 10:40:50 GMT
Connection: keep-alive
ETag: "60d46132-100000000"
Accept-Ranges: bytes

2021/06/24 18:55:29 [debug] 1870676#1870676: *2 write new buf t:1 f:0 0000AAAAF9001720, pos 0000AAAAF9001720, size: 252 file: 0, size: 0
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http write filter: l:0 f:0 s:252
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http output filter "/4gb.bin?"
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http copy filter: "/4gb.bin?"
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http postpone filter "/4gb.bin?" 0000FFFFC6A9D4F0
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 write old buf t:1 f:0 0000AAAAF9001720, pos 0000AAAAF9001720, size: 252 file: 0, size: 0
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 write new buf t:0 f:1 0000000000000000, pos 0000000000000000, size: 0 file: 0, size: 4294967296
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http write filter: l:1 f:0 s:4294967548
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http write filter limit 2147483647
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 tcp_nopush
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 writev: 252 of 252
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 sendfile: @0 2147479552
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 sendfile: 2147479552 of 2147479552 @0
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http write filter 0000AAAAF9001830
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 event timer add: 3: 1:1307815217
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http copy filter: -2 "/4gb.bin?"
2021/06/24 18:55:29 [debug] 1870676#1870676: *2 http finalize request: -2, "/4gb.bin?" a:1, c:1
2021/06/24 18:55:29 [debug] 1870676#1870676: timer delta: 0
2021/06/24 18:55:29 [debug] 1870676#1870676: worker cycle
2021/06/24 18:55:29 [debug] 1870676#1870676: epoll timer: 1
2021/06/24 18:55:31 [debug] 1870676#1870676: timer delta: 2704
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 event timer del: 3: 1307815217
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 http run request: "/4gb.bin?"
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 http writer handler: "/4gb.bin?"
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 http output filter "/4gb.bin?"
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 http copy filter: "/4gb.bin?"
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 http postpone filter "/4gb.bin?" 0000000000000000
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 write old buf t:0 f:1 0000000000000000, pos 0000000000000000, size: 0 file: 2147479552, size: 2147487744
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 http write filter: l:1 f:0 s:2147487744
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 http write filter limit 2147483647
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 sendfile: @2147479552 2147479552
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 sendfile: 2147479552 of 2147479552 @2147479552
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 http write filter 0000AAAAF9001830
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 event timer add: 3: 1:1307817921
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 http copy filter: -2 "/4gb.bin?"
2021/06/24 18:55:31 [debug] 1870676#1870676: *2 http writer output filter: -2, "/4gb.bin?"
2021/06/24 18:55:31 [debug] 1870676#1870676: worker cycle
2021/06/24 18:55:31 [debug] 1870676#1870676: epoll timer: 1
2021/06/24 18:55:34 [debug] 1870676#1870676: timer delta: 2540
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 event timer del: 3: 1307817921
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http run request: "/4gb.bin?"
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http writer handler: "/4gb.bin?"
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http output filter "/4gb.bin?"
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http copy filter: "/4gb.bin?"
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http postpone filter "/4gb.bin?" 0000000000000000
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 write old buf t:0 f:1 0000000000000000, pos 0000000000000000, size: 0 file: 4294959104, size: 8192
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http write filter: l:1 f:0 s:8192
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http write filter limit 2147483647
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 sendfile: @4294959104 8192
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 sendfile: 8192 of 8192 @4294959104
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http write filter 0000000000000000
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http copy filter: 0 "/4gb.bin?"
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http writer output filter: 0, "/4gb.bin?"
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http writer done: "/4gb.bin?"
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http finalize request: 0, "/4gb.bin?" a:1, c:1
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 set http keepalive handler
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http close request
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http log handler
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 run cleanup: 0000AAAAF901C0E0
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 file cleanup: fd:4
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 free: 0000AAAAF901B370, unused: 8
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 free: 0000AAAAF9001340, unused: 2689
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 free: 0000AAAAF8FE5200
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 hc free: 0000000000000000
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 hc busy: 0000000000000000 0
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 reusable connection: 1
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 event timer add: 3: 75000:1307895460
2021/06/24 18:55:34 [debug] 1870676#1870676: worker cycle
2021/06/24 18:55:34 [debug] 1870676#1870676: epoll timer: 75000
2021/06/24 18:55:34 [debug] 1870676#1870676: epoll: fd:3 ev:2001 d:0000FFFF9F2761E0
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 http keepalive handler
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 malloc: 0000AAAAF8FE5200:1024
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 recv: eof:1, avail:-1
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 recv: fd:3 0 of 1024
2021/06/24 18:55:34 [info] 1870676#1870676: *2 client 127.0.0.1 closed keepalive connection
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 close http connection: 3
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 event timer del: 3: 1307895460
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 reusable connection: 0
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 free: 0000AAAAF8FE5200
2021/06/24 18:55:34 [debug] 1870676#1870676: *2 free: 0000AAAAF8FFF0C0, unused: 128
2021/06/24 18:55:34 [debug] 1870676#1870676: timer delta: 0
2021/06/24 18:55:34 [debug] 1870676#1870676: worker cycle
2021/06/24 18:55:34 [debug] 1870676#1870676: epoll timer: -1

I'm setting the priority to critical because it affects many versions of nginx, from v1.12 to latest master, as I've tested.

Cheers,

Miao Wang

Change History (2)

comment:1 by Sergey Kandaurov, 3 years ago

Status: newaccepted

Thanks for the provided analysis, it looks correct. Some thoughts on this below.

A possible fix is to propagate the updated (send_chain) limits back to the callers to properly schedule a 1ms write event timer (namely, in the http write filter). Unfortunately, it means an API change.

Something like this (incomplete):

diff -r 0c5e84096d99 src/http/ngx_http_write_filter_module.c
--- a/src/http/ngx_http_write_filter_module.c   Mon Jun 21 09:42:43 2021 +0300
+++ b/src/http/ngx_http_write_filter_module.c   Thu Jun 24 22:21:42 2021 +0000
@@ -291,7 +291,7 @@
     ngx_log_debug1(NGX_LOG_DEBUG_HTTP, c->log, 0,
                    "http write filter limit %O", limit);
 
-    chain = c->send_chain(c, r->out, limit);
+    chain = c->send_chain(c, r->out, &limit);
 
     ngx_log_debug1(NGX_LOG_DEBUG_HTTP, c->log, 0,
                    "http write filter %p", chain);
diff -r 0c5e84096d99 src/os/unix/ngx_linux_sendfile_chain.c
--- a/src/os/unix/ngx_linux_sendfile_chain.c    Mon Jun 21 09:42:43 2021 +0300
+++ b/src/os/unix/ngx_linux_sendfile_chain.c    Thu Jun 24 22:21:42 2021 +0000
@@ -44,10 +44,10 @@
 
 
 ngx_chain_t *
-ngx_linux_sendfile_chain(ngx_connection_t *c, ngx_chain_t *in, off_t limit)
+ngx_linux_sendfile_chain(ngx_connection_t *c, ngx_chain_t *in, off_t *limitp)
 {
     int            tcp_nodelay;
-    off_t          send, prev_send;
+    off_t          send, prev_send, limit;
     size_t         file_size, sent;
     ssize_t        n;
     ngx_err_t      err;
@@ -66,8 +66,10 @@
 
     /* the maximum limit size is 2G-1 - the page size */
 
+    limit = *limitp;
+
     if (limit == 0 || limit > (off_t) (NGX_SENDFILE_MAXSIZE - ngx_pagesize)) {
-        limit = NGX_SENDFILE_MAXSIZE - ngx_pagesize;
+        limit = *limitp = NGX_SENDFILE_MAXSIZE - ngx_pagesize;
     }
 
 

Other options like respecting send_chain limits directly in the callers (layering violation) or moving the timer to send_chain (again, layering violation) doesn't seem to be eligible.

comment:2 by Maxim Dounin, 3 years ago

Resolution: duplicate
Status: acceptedclosed

Duplicate of #1870. Using sendfile_max_chunk fixes this, and anyway needed if sending 2G file does not block with EAGAIN in a particular configuration. As already suggested in #1870, probably the right solution would be to provide some reasonable default for sendfile_max_chunk.

Note: See TracTickets for help on using tickets.