#2313 closed defect (invalid)
Nginx deleted temporary file handles grows with time, when response buffering is enabled
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | major | Milestone: | |
Component: | nginx-core | Version: | 1.16.x |
Keywords: | deleted file descriptors | Cc: | |
uname -a: | Linux server 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.16.1 (Ubuntu)
built with OpenSSL 1.1.1 11 Sep 2018 (running with OpenSSL 1.1.1d 10 Sep 2019) TLS SNI support enabled configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/build/nginx-GkiujU/nginx-1.14.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module |
Description (last modified by )
We have a Nginx reverse proxy setup for our backend.
The backend servers are generally virtual machines serving static content, of some considerable size (order of tens of MB).
In some cases, we see that the list of deleted file descriptors held on by the nginx worker processes, keeps on growing with time. These are the files into which nginx buffers the response data while doing response buffering.
The following is the output when we run lsof | gre deleted
on the nginx worker process -
root@vm:/# lsof -p 23853 | grep deleted nginx 23853 www-data 27u REG 8,1 744255640 2304045 /var/lib/nginx/proxy/2/61/0000705612 (deleted) nginx 23853 www-data 30u REG 8,1 646818964 1792205 /var/lib/nginx/proxy/9/69/0000706699 (deleted) nginx 23853 www-data 35u REG 8,1 1073803396 1035997 /var/lib/nginx/proxy/0/34/0000708340 (deleted) nginx 23853 www-data 42u REG 8,1 1073771620 1036358 /var/lib/nginx/proxy/9/41/0000708419 (deleted) nginx 23853 www-data 52u REG 8,1 593854412 1035931 /var/lib/nginx/proxy/0/19/0000708190 (deleted) nginx 23853 www-data 53u REG 8,1 528483580 1037410 /var/lib/nginx/proxy/1/15/0000707151 (mdeleted) nginx 23853 www-data 59u REG 8,1 862402 1536153 /var/lib/nginx/proxy/7/63/0000708637 (deleted) nginx 23853 www-data 70u REG 8,1 491583868 2816053 /var/lib/nginx/proxy/1/95/0000705951 (deleted) nginx 23853 www-data 81u REG 8,1 593763076 1037419 /var/lib/nginx/proxy/7/44/0000707447 (deleted) nginx 23853 www-data 91u REG 8,1 705227972 1037418 /var/lib/nginx/proxy/0/02/0000707020 (deleted) nginx 23853 www-data 100u REG 8,1 813513652 2304044 /var/lib/nginx/proxy/1/61/0000705611 (deleted) nginx 23853 www-data 110u REG 8,1 725018752 1792209 /var/lib/nginx/proxy/4/56/0000708564 (deleted) nginx 23853 www-data 116u REG 8,1 704295268 1536146 /var/lib/nginx/proxy/7/55/0000708557 (deleted) nginx 23853 www-data 118u REG 8,1 568271144 1536151 /var/lib/nginx/proxy/6/20/0000708206 (deleted) nginx 23853 www-data 124u REG 8,1 1073779600 1536148 /var/lib/nginx/proxy/7/66/0000705667 (deleted) nginx 23853 www-data 132u REG 8,1 568274012 1536149 /var/lib/nginx/proxy/7/89/0000707897 (deleted) nginx 23853 www-data 133u REG 8,1 28899612 2304041 /var/lib/nginx/proxy/1/58/0000708581 (deleted) nginx 23853 www-data 174u REG 8,1 614443940 1792213 /var/lib/nginx/proxy/0/87/0000706870 (deleted) nginx 23853 www-data 175u REG 8,1 602806928 2048081 /var/lib/nginx/proxy/5/55/0000708555 (deleted) nginx 23853 www-data 198u REG 8,1 1073782744 1280031 /var/lib/nginx/proxy/4/22/0000708224 (deleted) nginx 23853 www-data 204u REG 8,1 16317016 2304043 /var/lib/nginx/proxy/2/58/0000708582 (deleted) nginx 23853 www-data 225u REG 8,1 677020928 2048087 /var/lib/nginx/proxy/6/55/0000708556 (deleted) nginx 23853 www-data 235u REG 8,1 676140876 1792207 /var/lib/nginx/proxy/4/83/0000705834 (deleted) nginx 23853 www-data 240u REG 8,1 679684336 1792210 /var/lib/nginx/proxy/8/53/0000708538 (deleted) nginx 23853 www-data 255u REG 8,1 16324864 1536152 /var/lib/nginx/proxy/3/58/0000708583 (deleted) nginx 23853 www-data 266u REG 8,1 1073777408 1536150 /var/lib/nginx/proxy/3/96/0000705963 (deleted) nginx 23853 www-data 289u REG 8,1 921235584 1792212 /var/lib/nginx/proxy/0/98/0000705980 (deleted)
This leads to increased disk utilization, and hence leads to increased I/O throttling for other processes.
What is the root cause for this? Is there any pattern that we might look out for? We see that these file descriptors are held long after the request has been served. Why are the file descriptors still held on to for such a long time? Are the file descriptors held on for the lifetime of a connection, or that of a request?
Our nginx config -
pid /var/run/nginx.pid; worker_processes 0; worker_rlimit_nofile 64000; worker_shutdown_timeout 3s; worker_rlimit_core 1024M; user www-data; error_log /tmp/log/nginx/error.log; events { worker_connections 64000; multi_accept on; use epoll; } http { access_log /tmp/log/nginx/access.log; proxy_http_version 1.1; resolver 127.0.0.11; sendfile on; tcp_nopush on; tcp_nodelay on; types_hash_max_size 2048; underscores_in_headers on; server_tokens off; client_max_body_size 4294967295; http2_max_params_per_settings 6; http2_max_settings_per_minute 100; http2_max_field_size 8k; http2_max_header_size 16k; large_client_header_buffers 4 32k; proxy_buffer_size 64k; proxy_buffers 8 64k; proxy_busy_buffers_size 128k; server { listen 1.2.3.4:8200; listen 8200; server_name ~.*; location / { proxy_pass https://backend; proxy_pass_header SEC-WEBSOCKET-EXTENSIONS; proxy_read_timeout 60s; set $custom_proxy_add_x_forwarded_for ''; proxy_set_header X-FORWARDED-PROTO $scheme; proxy_set_header X-FORWARDED-PORT $server_port; proxy_set_header Host $http_host; proxy_set_header X-ORIGINAL-HOST $http_host; proxy_next_upstream error off; proxy_ssl_verify off; proxy_ssl_verify_depth 5; proxy_ssl_session_reuse on; proxy_ssl_server_name on; proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt; } } upstream backend { server 127.0.10.1:8082 resolve max_fails=0; server 127.0.10.2:8082 resolve max_fails=0; keepalive 2; keepalive_requests 100; } }
Change History (5)
comment:1 by , 3 years ago
Description: | modified (diff) |
---|
comment:2 by , 3 years ago
comment:3 by , 3 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Feedback timeout. No information is provided to support the claim that temporary files are held open longer that the lifetime of a request. Otherwise, the behaviour observed is as expected.
comment:4 by , 2 years ago
We have similar issue where its happening on only one server for some reason and after a while server is being up (e.g. 14 days). We use nginx for proxying requests to ceph backend and we are using the same config on all servers. Sometimes sessions are pretty long (because users are copying files), but nevertheless we tested copying of >500GB big files without problem several times.
Restart of the server solves the problem (for a while).
comment:5 by , 12 months ago
We had the same problem on our Ubuntu 20.04 LTS Nginx Plus (R30) + Varnish 6.2 servers. It became an ongoing maintenance task to identify and remove these deleted Nginx log file handlers to recover disk space without restarting the Nginx service.
In our scenario, the issue was fixed by updating /etc/logrotate.d/nginx as follows :
/var/log/nginx/*.log { daily compress delaycompress rotate 30 missingok sharedscripts notifempty create 640 nginx adm postrotate [ -f /var/run/nginx.pid ] && kill -USR1 `cat /var/run/nginx.pid` endscript }
We believe the core issue in our case was caused by using the nocreate
option instead of create 640 nginx adm
which will create the log files under the correct user and group ownerships.
Before copy/pasting the above configuration, ensure the correct "create" User/Group ownerships are used for your Nginx log files in accordance with your configuration. Also, make sure all other logrotate options align with your specific environment requirements.
Reference: https://my.f5.com/manage/s/article/K000134463
HTH
Your report does not seem to show any evidence that files are held open longer than lifetime of a request. Please provide some information to support this claim.
Everything else in your report looks quite normal. Response disk buffering is expected to happen when enabled in the configuration as long as the client is slower than the backend, see proxy_max_temp_file_size. Temporary files are expected to be closed when the response is fully sent to the client (or, in case of caching, till the response is is fully received from the upstream server and saved to cache, whichever is longer). If you see files held longer than that, first of all you may want to focus on confirming it's the case: serving large responses to slow clients might take a lot of time, and it is easy to confuse things.
If disk buffering is problematic in your case, consider switching it off or limiting appropriately, see proxy_max_temp_file_size. Note though that without disk buffering connections to the backend server will be busy for much longer time, and this might be problematic for backends using process-per-connection model.