Opened 5 years ago

Closed 3 years ago

#1870 closed defect (fixed)

Large file download is not completed with sendfile option.

Reported by: jinyongchoi@… Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.17.x
Keywords: sendfile Cc:
uname -a: Linux x 5.0.0-31-generic #33-Ubuntu SMP Mon Sep 30 18:51:59 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.17.4
built by gcc 8.3.0 (Debian 8.3.0-6)
built with OpenSSL 1.1.1c 28 May 2019
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.17.4/debian/debuild-base/nginx-1.17.4=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'

Description

Hi there.

I found some issue in downloading test.
the problem is only use 'sendfile on'.
If 'sendfile_max_chunk' option or sendfile off is set in conf, download is ok.

And i seem the problem is epoll(EPOLLET) and sendfile is not working well.
If i remove EPOLLET option when event is NGX_WRITE_EVENT at ngx_epoll_add_event function in ngx_epoll_module.c, it working well.
or in testing, if it cached the content file in kernel, it working well.
(no used this command. 'echo 3 > /proc/sys/vm/drop_caches')

i found similar issue.#1174.

So, how i resolve this issue?

Testing Host : Ubuntu 19.04

Steps to Reproduce:

> mkdir -p /data0/nginx-data/
> truncate -s 10G /data0/nginx-data/10G.dat
> docker run --name nginx-10G-test -v /data0/nginx-data/:/usr/share/nginx/html:ro -d nginx:1.17.4
> sudo su
> export docker_ip=`docker inspect -f "{{ .NetworkSettings.IPAddress }}" nginx-10G-test`
> echo 3 > /proc/sys/vm/drop_caches; curl -v -o /dev/null http://${docker_ip}/10G.dat
* Expire in 0 ms for 6 (transfer 0x557ba829a5c0)
*   Trying 172.17.0.2...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x557ba829a5c0)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to 172.17.0.2 (172.17.0.2) port 80 (#0)
> GET /10G.dat HTTP/1.1
> Host: 172.17.0.2
> User-Agent: curl/7.64.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: nginx/1.17.4
< Date: Fri, 11 Oct 2019 07:29:02 GMT
< Content-Type: application/octet-stream
< Content-Length: 10737418240
< Last-Modified: Fri, 11 Oct 2019 02:23:06 GMT
< Connection: keep-alive
< ETag: "5d9fe78a-280000000"
< Accept-Ranges: bytes
< 
{ [65536 bytes data]
 19 10.0G   19 2047M    0     0  33.7M      0  0:05:03  0:01:00  0:04:03     0* transfer closed with 8589938688 bytes remaining to read
 19 10.0G   19 2047M    0     0  33.7M      0  0:05:03  0:01:00  0:04:03     0
* Closing connection 0
curl: (18) transfer closed with 8589938688 bytes remaining to read

Change History (7)

comment:1 by Maxim Dounin, 5 years ago

From the numbers — connection closed at 2G point — it looks like the problem is that sendfile() doesn't block on sending at all, due to fast network connection and slow enough disk. As such, it sends up to the maximum number of bytes it can send on Linux, 2G. Currently this limitation, if ever reached, is not accounted by the top-level code, and sending is not automatically retried. An obvious workaround is to use sendfile_max_chunk, which is specially designed to avoid cases when nginx spins in sendfile() for a long time when network connection is faster than disk subsystem.

Further, it looks like using sendfile_max_chunk is anyway required here, as "Time Spent" is 1 minute. If the above analysis is correct, this means that all other connections in the worker process in question were blocked for 1 minute, which is certainly not an acceptable behaviour.

Not sure if we need/want to do anything with this. The sendfile_max_chunk directive already exists, resolves this, and anyway have to be used in such configurations. It might be a good idea to introduce some reasonable default for sendfile_max_chunk though.

Last edited 5 years ago by Maxim Dounin (previous) (diff)

comment:2 by jinyongchoi@…, 5 years ago

Thanks for the explanation.
I'm glad to know about a reason and best way.

comment:3 by Maxim Dounin, 5 years ago

See also #1924.

comment:4 by maxim, 5 years ago

Milestone: nginx-1.17

Ticket retargeted after milestone closed

comment:5 by Maxim Dounin, 3 years ago

See also #1870.

Version 0, edited 3 years ago by Maxim Dounin (next)

comment:6 by Maxim Dounin <mdounin@…>, 3 years ago

In 7947:51a260276425/nginx:

Simplified sendfile_max_chunk handling.

Previously, it was checked that sendfile_max_chunk was enabled and
almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
delaying connections where sendfile_max_chunk wasn't reached (for example,
when sending responses smaller than sendfile_max_chunk). Now we instead
check if there are unsent data, and the connection is still ready for writing.
Additionally we also check c->write->delayed to ignore connections already
delayed by limit_rate.

This approach is believed to be more robust, and correctly handles
not only sendfile_max_chunk, but also internal limits of c->send_chain(),
such as sendfile() maximum supported length (ticket #1870).

comment:7 by Maxim Dounin, 3 years ago

Resolution: fixed
Status: newclosed

Fix committed, sendfile() internal limits should be correctly handled now. Further, sendfile_max_chunk now defaults to 2m (e3dbd9449b14), so worker monopolization shouldn't be a problem either.

Note: See TracTickets for help on using tickets.