Opened 3 years ago

Closed 3 years ago

Last modified 13 months ago

#2313 closed defect (invalid)

Nginx deleted temporary file handles grows with time, when response buffering is enabled

Reported by: arighna.chakrabarty100@… Owned by:
Priority: major Milestone:
Component: nginx-core Version: 1.16.x
Keywords: deleted file descriptors Cc:
uname -a: Linux server 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.16.1 (Ubuntu)
built with OpenSSL 1.1.1 11 Sep 2018 (running with OpenSSL 1.1.1d 10 Sep 2019)
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/build/nginx-GkiujU/nginx-1.14.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module

Description (last modified by arighna.chakrabarty100@…)

We have a Nginx reverse proxy setup for our backend.
The backend servers are generally virtual machines serving static content, of some considerable size (order of tens of MB).

In some cases, we see that the list of deleted file descriptors held on by the nginx worker processes, keeps on growing with time. These are the files into which nginx buffers the response data while doing response buffering.

The following is the output when we run lsof | gre deleted on the nginx worker process -

root@vm:/# lsof -p 23853 | grep deleted
nginx   23853 www-data   27u      REG                8,1  744255640   2304045 /var/lib/nginx/proxy/2/61/0000705612 (deleted)
nginx   23853 www-data   30u      REG                8,1  646818964   1792205 /var/lib/nginx/proxy/9/69/0000706699 (deleted)
nginx   23853 www-data   35u      REG                8,1 1073803396   1035997 /var/lib/nginx/proxy/0/34/0000708340 (deleted)
nginx   23853 www-data   42u      REG                8,1 1073771620   1036358 /var/lib/nginx/proxy/9/41/0000708419 (deleted)
nginx   23853 www-data   52u      REG                8,1  593854412   1035931 /var/lib/nginx/proxy/0/19/0000708190 (deleted)
nginx   23853 www-data   53u      REG                8,1  528483580   1037410 /var/lib/nginx/proxy/1/15/0000707151 (mdeleted)
nginx   23853 www-data   59u      REG                8,1     862402   1536153 /var/lib/nginx/proxy/7/63/0000708637 (deleted)
nginx   23853 www-data   70u      REG                8,1  491583868   2816053 /var/lib/nginx/proxy/1/95/0000705951 (deleted)
nginx   23853 www-data   81u      REG                8,1  593763076   1037419 /var/lib/nginx/proxy/7/44/0000707447 (deleted)
nginx   23853 www-data   91u      REG                8,1  705227972   1037418 /var/lib/nginx/proxy/0/02/0000707020 (deleted)
nginx   23853 www-data  100u      REG                8,1  813513652   2304044 /var/lib/nginx/proxy/1/61/0000705611 (deleted)
nginx   23853 www-data  110u      REG                8,1  725018752   1792209 /var/lib/nginx/proxy/4/56/0000708564 (deleted)
nginx   23853 www-data  116u      REG                8,1  704295268   1536146 /var/lib/nginx/proxy/7/55/0000708557 (deleted)
nginx   23853 www-data  118u      REG                8,1  568271144   1536151 /var/lib/nginx/proxy/6/20/0000708206 (deleted)
nginx   23853 www-data  124u      REG                8,1 1073779600   1536148 /var/lib/nginx/proxy/7/66/0000705667 (deleted)
nginx   23853 www-data  132u      REG                8,1  568274012   1536149 /var/lib/nginx/proxy/7/89/0000707897 (deleted)
nginx   23853 www-data  133u      REG                8,1   28899612   2304041 /var/lib/nginx/proxy/1/58/0000708581 (deleted)
nginx   23853 www-data  174u      REG                8,1  614443940   1792213 /var/lib/nginx/proxy/0/87/0000706870 (deleted)
nginx   23853 www-data  175u      REG                8,1  602806928   2048081 /var/lib/nginx/proxy/5/55/0000708555 (deleted)
nginx   23853 www-data  198u      REG                8,1 1073782744   1280031 /var/lib/nginx/proxy/4/22/0000708224 (deleted)
nginx   23853 www-data  204u      REG                8,1   16317016   2304043 /var/lib/nginx/proxy/2/58/0000708582 (deleted)
nginx   23853 www-data  225u      REG                8,1  677020928   2048087 /var/lib/nginx/proxy/6/55/0000708556 (deleted)
nginx   23853 www-data  235u      REG                8,1  676140876   1792207 /var/lib/nginx/proxy/4/83/0000705834 (deleted)
nginx   23853 www-data  240u      REG                8,1  679684336   1792210 /var/lib/nginx/proxy/8/53/0000708538 (deleted)
nginx   23853 www-data  255u      REG                8,1   16324864   1536152 /var/lib/nginx/proxy/3/58/0000708583 (deleted)
nginx   23853 www-data  266u      REG                8,1 1073777408   1536150 /var/lib/nginx/proxy/3/96/0000705963 (deleted)
nginx   23853 www-data  289u      REG                8,1  921235584   1792212 /var/lib/nginx/proxy/0/98/0000705980 (deleted)

This leads to increased disk utilization, and hence leads to increased I/O throttling for other processes.

What is the root cause for this? Is there any pattern that we might look out for? We see that these file descriptors are held long after the request has been served. Why are the file descriptors still held on to for such a long time? Are the file descriptors held on for the lifetime of a connection, or that of a request?

Our nginx config -

pid /var/run/nginx.pid;
worker_processes 0;
worker_rlimit_nofile 64000;
worker_shutdown_timeout 3s;
worker_rlimit_core 1024M;
user www-data;
error_log /tmp/log/nginx/error.log;

events {
  worker_connections 64000;
  multi_accept on;
  use epoll;
}

http {
  access_log /tmp/log/nginx/access.log;
  proxy_http_version 1.1;
  resolver 127.0.0.11;
  sendfile on;
  tcp_nopush on;
  tcp_nodelay on;
  types_hash_max_size 2048;
  underscores_in_headers on;
  server_tokens off;
  client_max_body_size 4294967295;
  http2_max_params_per_settings 6;
  http2_max_settings_per_minute 100;
  http2_max_field_size 8k;
  http2_max_header_size 16k;
  large_client_header_buffers 4 32k;
  proxy_buffer_size 64k;
  proxy_buffers 8 64k;
  proxy_busy_buffers_size 128k;

  server {
    listen 1.2.3.4:8200;
    listen 8200;
    server_name ~.*;

    location / {
      proxy_pass https://backend;
      proxy_pass_header SEC-WEBSOCKET-EXTENSIONS;
      proxy_read_timeout 60s;
      set $custom_proxy_add_x_forwarded_for '';
      proxy_set_header X-FORWARDED-PROTO $scheme;
      proxy_set_header X-FORWARDED-PORT $server_port;
      proxy_set_header Host $http_host;
      proxy_set_header X-ORIGINAL-HOST $http_host;
      proxy_next_upstream error off;
      proxy_ssl_verify off;
      proxy_ssl_verify_depth 5;
      proxy_ssl_session_reuse on;
      proxy_ssl_server_name on;
      proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
    }
  }

  upstream backend {
    server 127.0.10.1:8082 resolve max_fails=0;
    server 127.0.10.2:8082 resolve max_fails=0;
    keepalive 2;
    keepalive_requests 100;
  }
}

Change History (5)

comment:1 by arighna.chakrabarty100@…, 3 years ago

Description: modified (diff)

comment:2 by Maxim Dounin, 3 years ago

We see that these file descriptors are held long after the request has been served. Why are the file descriptors still held on to for such a long time?

Your report does not seem to show any evidence that files are held open longer than lifetime of a request. Please provide some information to support this claim.

Everything else in your report looks quite normal. Response disk buffering is expected to happen when enabled in the configuration as long as the client is slower than the backend, see proxy_max_temp_file_size. Temporary files are expected to be closed when the response is fully sent to the client (or, in case of caching, till the response is is fully received from the upstream server and saved to cache, whichever is longer). If you see files held longer than that, first of all you may want to focus on confirming it's the case: serving large responses to slow clients might take a lot of time, and it is easy to confuse things.

If disk buffering is problematic in your case, consider switching it off or limiting appropriately, see proxy_max_temp_file_size. Note though that without disk buffering connections to the backend server will be busy for much longer time, and this might be problematic for backends using process-per-connection model.

comment:3 by Maxim Dounin, 3 years ago

Resolution: invalid
Status: newclosed

Feedback timeout. No information is provided to support the claim that temporary files are held open longer that the lifetime of a request. Otherwise, the behaviour observed is as expected.

comment:4 by rjaklic@…, 2 years ago

We have similar issue where its happening on only one server for some reason and after a while server is being up (e.g. 14 days). We use nginx for proxying requests to ceph backend and we are using the same config on all servers. Sometimes sessions are pretty long (because users are copying files), but nevertheless we tested copying of >500GB big files without problem several times.

Restart of the server solves the problem (for a while).

comment:5 by arzoum@…, 13 months ago

We had the same problem on our Ubuntu 20.04 LTS Nginx Plus (R30) + Varnish 6.2 servers. It became an ongoing maintenance task to identify and remove these deleted Nginx log file handlers to recover disk space without restarting the Nginx service.

In our scenario, the issue was fixed by updating /etc/logrotate.d/nginx as follows :

/var/log/nginx/*.log {
        daily
        compress
        delaycompress
        rotate 30
        missingok
        sharedscripts
        notifempty
        create 640 nginx adm
        postrotate
                [ -f /var/run/nginx.pid ] && kill -USR1 `cat /var/run/nginx.pid`
        endscript
}

We believe the core issue in our case was caused by using the nocreate option instead of create 640 nginx adm which will create the log files under the correct user and group ownerships.

Before copy/pasting the above configuration, ensure the correct "create" User/Group ownerships are used for your Nginx log files in accordance with your configuration. Also, make sure all other logrotate options align with your specific environment requirements.

Reference: https://my.f5.com/manage/s/article/K000134463

HTH

Note: See TracTickets for help on using tickets.