Opened 7 months ago

Closed 7 months ago

Last modified 7 months ago

#1945 closed defect (invalid)

Caching proxy error with large files

Reported by: cinderblock@… Owned by:
Priority: minor Milestone:
Component: nginx-module Version: 1.14.x
Keywords: Cc: cinderblock@…
uname -a: Linux server 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.14.0 (Ubuntu)
built with OpenSSL 1.1.1 11 Sep 2018 (running with OpenSSL 1.1.1d 10 Sep 2019)
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/build/nginx-GkiujU/nginx-1.14.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module

Description

I have Nginx setup as a reverse proxy for a backend.

This backend serves recordings from my custom security/monitoring system. The files are just served via nginx with fancy_index at backend.dl.example.com.

Because I know the backend files never change, I've configured my proxy to cache them for up to 1 year (effectively infinity). However of course the list of available files with fancy_index changes so that needs a much shorter cache timeout.

This is all pretty much working.

proxy_cache_path /var/cache/nginx/dl use_temp_path=off keys_zone=dl:10m max_size=45g inactive=1y;

server {
  server_name dl.example.com;

  listen 443 ssl;
  listen [::]:443 ssl;
  ssl_certificate     /etc/letsencrypt/live/dl.example.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/dl.example.com/privkey.pem;

  proxy_set_header X-Real-IP $remote_addr;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

  proxy_cache_use_stale updating;
  proxy_ignore_client_abort on;

  proxy_cache_lock on;
  proxy_cache_lock_age 25m;
  proxy_cache_lock_timeout 25m;

  proxy_ssl_verify on;
  proxy_ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt;
  proxy_ssl_server_name on;
  proxy_ssl_name backend.dl.example.com;
  proxy_set_header Host backend.dl.example.com;

  proxy_cache dl;

  location / {
    proxy_pass https://dl;
    proxy_cache_valid 5;
    proxy_cache_background_update on;
  }

  location ~ \.(mkv)$ {
    proxy_pass https://dl;
    proxy_cache_valid 1y;
    proxy_max_temp_file_size 10m;
  }
}

upstream dl {
  server backend.dl.example.com:443;
}

The problem I'm having is when the cache is full and another large video file is downloaded. With my choice of max_size=45g, I only have a couple gigs to spare.

When a client requests a new video, nginx dutifully starts fetching and caching the latest video on the caching server. I can see /var/cache/nginx/dl growing.

The problem is that it grows beyond the max_size=45g and uses other space on the server. If there is not enough space on the caching server, you get an error similar to this:

2020/04/03 09:09:53 [crit] 29148#29148: *230436 pwritev() "/var/cache/nginx/dl/c7939e13d7e0264add6082fb397a0923.0000000776" has written only 12288 of 16384 while reading upstream, client: IPv6, server: dl.example.com, request: "GET /recording.mkv HTTP/1.1", upstream: "https://[IPv6]:443/recording.mkv", host: "dl.example.com"

The client downloading the file gets an error, the not fully downloaded file is removed from the cache, and you basically get a soft lock on nginx and can't download the file through the cache.

If you delete enough of the existing cache, it works as expected. This also reveals the problem.

Nginx only seems to check if the cache is over max_size at the end of any request.

IMHO, Nginx should do one of (or give options to enable):

  • Check that the expected download size (as given by headers) will fit in the current cache (not reliable)
  • If it does run out of space while downloading, clear the oldest cache, and try to continue gracefully

In any case, if there really is not enough space to cache the full request on the caching disk, the request for the initializing client should not fail as the caching nginx should be able to just keep passing the data through.

Change History (3)

comment:1 by Maxim Dounin, 7 months ago

Resolution: invalid
Status: newclosed

The max_size= limit is maintained by the cache manager:

The special “cache manager” process monitors the maximum cache size set by the max_size parameter. When this size is exceeded, it removes the least recently used data.

That is, the cache can grow larger than the maximum size configured, and the size will be reduced only after this. This implies that you have to configure cache size in a way that reserves some space for additional cache files.

Note well that temporary files, even if you use use_temp_path=off, are not counted towards the maximum cache size limit. In you case there isn't enough room to store a temporary file during proxying, and this results in a fatal error during proxying. You are expected to reserve enough space for temporary files, or things won't work.

Summing the above, your configuration is expected to result in errors. You have to reserve more space for temporary files and possible cache growth over maximum size. While this might not be very convenient in some setups, this is how things are expected to work, and the behaviour you observe is not a bug, but rather a result of misconfiguration.

comment:2 by cinderblock@…, 7 months ago

So, by design, nginx is incapable of using a cache with a proxy when there is not enough storage space for a possibility of a large file? By that logic, nginx should basically never be used as a caching server. How can you guarantee in all cases that the proxied server never serves more than a certain size in any single request?

Essentially, I don't see how it's acceptable that the connection to the client fails when writing to disk fails. This should be similar to a case where buffers are skipped because the proxied request doesn't fit into the 8k buffer.

I get that if there isn't enough storage space, there is no way to cache the proxied request. That doesn't mean the proxied request should fail, imho.


I, frankly, think the temp files should be counted towards the used space, especially if use_temp_path=off is used. Is there no other way to limit the size of temp_path?

comment:3 by Maxim Dounin, 7 months ago

You are proxying your own backends, and you can estimate maximum response size before switching on caching and ensure appropriate space for temporary files. If you can't, you probably shouldn't switch on caching.

And no, you cannot limit maximum total size of temp path. When using proxying without caching or proxy_store, temporary file size is limited to proxy_max_temp_file_size *per request*, so you can estimate maximum size of all temporary files by multiplying this limit and the potential number of parallel requests. With caching or proxy_store, the proxy_max_temp_file_size limit does not apply, so you have to estimate total size based on typical file sizes in your system.

As already said above, it is understood that this behaviour might not be very convenient in some setups like ones with very large files, yet this is how it works. And there are reasons why it works this way. If you think that current behaviour can be improved without introducing unreasonable complexity in the source code, consider submitting a patch.

Note: See TracTickets for help on using tickets.