#2384 closed defect (invalid)
http upstream cache and slicing leaks open files
Reported by: | Thibault VINCENT | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | nginx-core | Version: | 1.23.x |
Keywords: | Cc: | ||
uname -a: | Linux 5.19.4-200.fc36.x86_64 | ||
nginx -V: |
nginx version: nginx/1.23.1
built by gcc 12.2.1 20220819 (Red Hat 12.2.1-1) (GCC) configure arguments: --without-http_rewrite_module --without-http_gzip_module --with-http_slice_module |
Description
Using proxy_cache_path
and proxy_cache
with slice
is causing a file descriptor exhaustion while serving subrequests, which leads to a worker crash.
Test configuration:
proxy_cache_path /mnt/cache levels=1:2 keys_zone=zone:100m max_size=500g min_free=10g inactive=4h use_temp_path=off; server { listen 80 default_server reuseport backlog=4096; location / { add_header Cache-Control "public, no-transform"; expires 1d; # low to trigger bug slice 1k; proxy_cache zone; proxy_cache_key $uri$slice_range; proxy_set_header Range $slice_range; proxy_http_version 1.1; proxy_set_header Connection ""; proxy_pass_request_body off; proxy_pass http://192.168.123.123; } }
Running with a limit of 1000 maximum open files, this proxy will crash when serving the 500th slice of so. It does not matter whether it's a 100% cache miss or hit situation.
I guess the root cause may be cache file descriptors left open till the main request is cleaned up.
[crit] 65993#0: *1 open() "/mnt/cache/f/5c/4cca8c6cb531b0aaa5a2fe16f43d45cf.0000000996" failed (24: Too many open files) while reading upstream, client: 127.0.0.1, server: server, request: "GET /test HTTP/1.1", subrequest: "/test", upstream: "http://192.168.123.123/test", host: "127.0.0.1:80"
Is it a design limitation of the slice module, or could it be improved?
Thanks
Change History (3)
follow-up: 2 comment:1 by , 2 years ago
comment:2 by , 2 years ago
Replying to Maxim Dounin:
Could you please clarify what do you mean by "leads to a worker crash"? Once file descriptors are exhausted, nginx will report appropriate errors and continue to work on other requests.
Apologies, I was overwhelmed by debug logs and may have over-interpreted the presence of process reaping messages:
2022/08/30 11:54:10 [crit] 88242#0: *1 open() "/mnt/cache/3/7b/759e4ec9ae96efa161ca49ceed2c97b3.0000000996" failed (24: Too many open files) while reading upstream, client: 127.0.0.1, server: server, request: "GET /test HTTP/1.1", subrequest: "/test", upstream: "http://192.168.123.123:80/test", host: "127.0.0.1:80" [...] 2022/08/30 11:54:51 [debug] 88248#0: http file cache loader time elapsed: 5 2022/08/30 11:54:51 [notice] 88248#0: http file cache: /mnt/cache 167.258M, bsize: 4096 2022/08/30 11:54:51 [notice] 88236#0: signal 17 (SIGCHLD) received from 88248 2022/08/30 11:54:51 [notice] 88236#0: cache loader process 88248 exited with code 0 2022/08/30 11:54:51 [debug] 88236#0: shmtx forced unlock 2022/08/30 11:54:51 [debug] 88236#0: shmtx forced unlock 2022/08/30 11:54:51 [debug] 88236#0: wake up, sigio 0 2022/08/30 11:54:51 [debug] 88236#0: reap children
Nothing is suggesting an abnormal exit.
While theoretically it is possible to free some resources earlier, this requires noticeable additional effort for each resource and not something usually done.
It made sense looking at the code. Thank you for the confirmation.
In general, this is something to consider while configuring the slice module. Too small slice size can easily lead to significant resource usage, eliminating advantages of using the slice module.
Indeed I've had to tweak slice size a few times lately when the dataset changed.
This module may not be suit the use case anymore.
Thanks
comment:3 by , 2 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Nothing is suggesting an abnormal exit.
Thanks for confirming.
Indeed I've had to tweak slice size a few times lately when the dataset changed.
This module may not be suit the use case anymore.
Note that the slice module is more about dealing with network throughput and disk space limits. It makes it possible to effectively cache huge files, notably when:
- Downloading files from the upstream server takes significant time, and therefore proxy_cache_lock is not effective in preventing multiple simultaneous downloads.
- Only small ranges are in fact requested by clients, and therefore downloading and caching the whole file wastes significant resources.
It is not, however, expected to help when a single client requests a file and is going to download the whole file. Rather, in such scenario the slice module will certainly need more resources.
Closing this, as the reported behaviour is expected downside of using multiple subrequests to load the file.
Replying to Thibault VINCENT:
Could you please clarify what do you mean by "leads to a worker crash"? Once file descriptors are exhausted, nginx will report appropriate errors and continue to work on other requests.
If you instead see a worker process crash, that is, if a worker exits abnormally, for example, due to a segmentation fault, this might be something to further dig into.
Exactly.
The slice module creates a subrequest for each slice. Each subrequest allocates resources, including file descriptors, which are normally freed with the main request. While theoretically it is possible to free some resources earlier, this requires noticeable additional effort for each resource and not something usually done.
In general, this is something to consider while configuring the slice module. Too small slice size can easily lead to significant resource usage, eliminating advantages of using the slice module.