Opened 5 weeks ago

Last modified 5 weeks ago

#1870 new defect

Large file download is not completed with sendfile option.

Reported by: jinyongchoi@… Owned by:
Priority: minor Milestone: nginx-1.17
Component: nginx-core Version: 1.17.x
Keywords: sendfile Cc:
uname -a: Linux x 5.0.0-31-generic #33-Ubuntu SMP Mon Sep 30 18:51:59 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.17.4 built by gcc 8.3.0 (Debian 8.3.0-6) built with OpenSSL 1.1.1c 28 May 2019 TLS SNI support enabled configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.17.4/debian/debuild-base/nginx-1.17.4=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'

Description

Hi there.

I found some issue in downloading test.
the problem is only use 'sendfile on'.
If 'sendfile_max_chunk' option or sendfile off is set in conf, download is ok.

And i seem the problem is epoll(EPOLLET) and sendfile is not working well.
If i remove EPOLLET option when event is NGX_WRITE_EVENT at ngx_epoll_add_event function in ngx_epoll_module.c, it working well.
or in testing, if it cached the content file in kernel, it working well.
(no used this command. 'echo 3 > /proc/sys/vm/drop_caches')

i found similar issue.#1174.

So, how i resolve this issue?

Testing Host : Ubuntu 19.04

Steps to Reproduce:

> mkdir -p /data0/nginx-data/
> truncate -s 10G /data0/nginx-data/10G.dat
> docker run --name nginx-10G-test -v /data0/nginx-data/:/usr/share/nginx/html:ro -d nginx:1.17.4
> sudo su
> export docker_ip=`docker inspect -f "{{ .NetworkSettings.IPAddress }}" nginx-10G-test`
> echo 3 > /proc/sys/vm/drop_caches; curl -v -o /dev/null http://${docker_ip}/10G.dat
* Expire in 0 ms for 6 (transfer 0x557ba829a5c0)
*   Trying 172.17.0.2...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x557ba829a5c0)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to 172.17.0.2 (172.17.0.2) port 80 (#0)
> GET /10G.dat HTTP/1.1
> Host: 172.17.0.2
> User-Agent: curl/7.64.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: nginx/1.17.4
< Date: Fri, 11 Oct 2019 07:29:02 GMT
< Content-Type: application/octet-stream
< Content-Length: 10737418240
< Last-Modified: Fri, 11 Oct 2019 02:23:06 GMT
< Connection: keep-alive
< ETag: "5d9fe78a-280000000"
< Accept-Ranges: bytes
< 
{ [65536 bytes data]
 19 10.0G   19 2047M    0     0  33.7M      0  0:05:03  0:01:00  0:04:03     0* transfer closed with 8589938688 bytes remaining to read
 19 10.0G   19 2047M    0     0  33.7M      0  0:05:03  0:01:00  0:04:03     0
* Closing connection 0
curl: (18) transfer closed with 8589938688 bytes remaining to read

Change History (2)

comment:1 Changed 5 weeks ago by mdounin

From the numbers — connection closed at 2G point — it looks like the problem is that sendfile() doesn't block on sending at all, due to fast network connection and slow enough disk. As such, it sends up to the maximum number of bytes it can send on Linux, 2G. Currently this limitation, if ever reached, is not accounted by the top-level code, and sending is not automatically retried. An obvious workaround is to use sendfile_max_chunk, which is specially designed to avoid cases when nginx spins in sendfile() for a long time when network connection is faster than disk subsystem.

Further, it looks like using sendfile_max_chunk is anyway required here, as "Time Spent" is 1 minute. If the above analysis is correct, this means that all other connections in the worker process in question were blocked for 1 minute, which is certainly not an acceptable behaviour.

Not sure if we need/want to do anything with this. The sendfile_max_chunk directive already exists, resolves this, and anyway have to be used in such configurations. It might be a good idea to introduce some reasonable default for sendfile_max_chunk though.

Last edited 5 weeks ago by mdounin (previous) (diff)

comment:2 Changed 5 weeks ago by jinyongchoi@…

Thanks for the explanation.
I'm glad to know about a reason and best way.

Note: See TracTickets for help on using tickets.