Opened 8 months ago

Last modified 7 months ago

#2254 new enhancement

cache loader ignores reopen signal

Reported by: mirek.chocholous.showmax.com@… Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.19.x
Keywords: log reopen Cc:
uname -a: Linux redacted.showmax.com 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.20.0
built with OpenSSL 1.1.1d 10 Sep 2019
TLS SNI support enabled
configure arguments: --prefix=/opt/nginx/nginx --with-cc-opt='-O2 -g -O3 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -fPIC -D_FORTIFY_SOURCE=2' --add-module=../ngx_devel_kit-0.3.1 --add-module=../echo-nginx-module-0.62 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.32 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.08 --add-module=../srcache-nginx-module-0.32 --add-module=../ngx_lua-0.10.20 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.15 --add-module=../rds-csv-nginx-module-0.09 --add-module=../ngx_stream_lua-0.0.10 --with-ld-opt='-Wl,-rpath,/opt/nginx/luajit/lib -fPIE' --sbin-path=/opt/nginx/sbin/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/opt/nginx/modules --http-client-body-temp-path=/opt/nginx/body --http-fastcgi-temp-path=/opt/nginx/fastcgi --http-proxy-temp-path=/opt/nginx/proxy --modules-path=/opt/nginx/modules --http-client-body-temp-path=/opt/nginx/body --http-fastcgi-temp-path=/opt/nginx/fastcgi --http-proxy-temp-path=/opt/nginx/proxy --with-threads --with-file-aio --with-http_v2_module --with-http_slice_module --with-http_realip_module --with-http_stub_status_module --add-module=/nginx/openresty/nginx-1.20.0.1/bundle/nginx-http-slice --add-module=/nginx/openresty/nginx-1.20.0.1/bundle/nginx-upstream-dynamic-servers --add-module=/nginx/openresty/nginx-1.20.0.1/bundle/nginx-module-vts --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_ssl_module

Description

Hi,
in our environment we use logrotate to rotate the logs regularly. We call nginx -s reopen after the logrotate and most of the times it works OK. The only exception is when nginx was restarted and cache loader is reading all the information about cached data from disk.

When cache loader is running it ignores the nginx -s reopen and keeps the file descriptor to the original log. When the rotated log file is deleted the disk space can't be released and that may lead to full log partition.

Steps to reproduce:

# cat /etc/logrotate.d/cdn_nginx_access
/var/log/nginx/access.log {

create 0640 www-data adm
daily
missingok
nocopytruncate
notifempty
rotate 0
sharedscripts
postrotate

nginx -s reopen>/dev/null 2>&1

endscript
prerotate

if [ -d /etc/logrotate.d/httpd-prerotate ]; then run-parts /etc/logrotate.d/httpd-prerotate; fi

endscript

}

# cat /etc/logrotate.d/cdn_nginx_other

/var/log/nginx/error.log {

create 0640 www-data adm
daily
missingok
nocompress
nocopytruncate
notifempty
rotate 0
sharedscripts
postrotate

nginx -s reopen>/dev/null 2>&1

endscript
prerotate

if [ -d /etc/logrotate.d/httpd-prerotate ]; then run-parts /etc/logrotate.d/httpd-prerotate; fi

endscript

}
# systemctl restart nginx
# lsof /var/log | grep delet
# logrotate -f /etc/logrotate.d/cdn_nginx_access
# logrotate -f /etc/logrotate.d/cdn_nginx_other
# lsof /var/log | grep delet
nginx 26823 www-data 2w REG 9,2 711665 1835015 /var/log/nginx/error.log.1 (deleted)
nginx 26823 www-data 4w REG 9,2 5831379042 1835014 /var/log/nginx/access.log.1 (deleted)
nginx 26823 www-data 5w REG 9,2 711665 1835015 /var/log/nginx/error.log.1 (deleted)

# ps -efww | grep 26823
www-data 26823 26626 0 13:16 ? 00:00:01 nginx: cache loader process

Change History (5)

comment:1 by mirek.chocholous.showmax.com@…, 8 months ago

I forgot to mention one important detail: we have got 32 cache devices in place defined like this:

proxy_cache_path /run/cache/cache_01/cache_01 levels=2:2 keys_zone=DASH_cache_28:1024m inactive=720d max_size=2500g loader_threshold=200 loader_files=500 loader_sleep=50 use_temp_path=off;

And each of them currently contains 2TB of data so the cache loader process takes approx week(s) to finish.

Last edited 8 months ago by mirek.chocholous.showmax.com@… (previous) (diff)

comment:2 by Maxim Dounin, 7 months ago

Priority: majorminor
Type: defectenhancement

As currently implemented, cache loading happens in one run, without return to the event loop. As a result, reopening log files is only handled when the cache is fully loaded. While reopening logs is in fact implemented, it does not work, since cache loader exits once the cache is loaded. This does not seem to be a major issue though: at most, cache loader will prevent removal of one set of log files, and will release them once the cache loading is finished.

Note that certainly cache loading is not expected to work for weeks: rather, it should be something like hours at most. Cache loading implies degraded performance, and, more importantly, non-working cache management. That is, loading cache for weeks might have much worse consequences than keeping log files from being freed. If it takes weeks in your setup, it might be a good idea to reconsider the configuration. In particular, testing larger loader_files might be a good idea.

Keeping this open as an enhancement, in case somebody is willing to work on rewriting cache loading to be more friendly to very large caches.

comment:3 by mirek.chocholous.showmax.com@…, 7 months ago

Generally I understand your arguments - nginx isn't removing any data from the cache unless all the files were already read which might have it's dark sides. In our use case we have space for that - we don't have many new objects to be cached so we are OK when the cache isn't pruned for weeks. On the other hand the system drive is limited and amount of requests is high (it's a CDN for video streaming) so the logs are massive.

We use workaround and do one logrotate just after nginx start when the log is small.

But I disagree it is just an enhancement. For loader_files the default value is 100, we are using 500 and increasing that value by orders of magnitude to really go below one day would mean high load on disks and worse performance during that day (there are tens of such caches on each server. The nice thing on cache loader is that it's not disturbing the clients' performance.

comment:4 by Maxim Dounin, 7 months ago

But I disagree it is just an enhancement

It's a matter of how effective disk usage is during loading of the cache. Much like you reserve space for caches itself, since they aren't properly managed before being loaded, you can reserve space for one additional set of logs which aren't removed while the cache is being loaded. If logs are huge and this is not something you can afford, consider rotating logs more often: this will make logs smaller at no cost.

While certainly cache loading can be made better and more effective, this doesn't look like a bug, but rather a limitation of current implementation.

For loader_files the default value is 100, we are using 500 and increasing that value by orders of magnitude to really go below one day would mean high load on disks and worse performance during that day

Default values certainly not tuned for large-scale installations. Depending on the disks used and the number of worker processes, arbitrary high loader_files might be good enough and won't cause any performance degradation - as suggested, it might be a good idea to test it. Either way, this was just a suggestion on how to improve your setup, feel free to ignore it.

comment:5 by mirek.chocholous.showmax.com@…, 7 months ago

You are right I'll try to tune the values. If I'll combine it with ionice it should be pretty safe to put pretty high numbers into the config.

Thank you for the advice. I appreciate it.

Note: See TracTickets for help on using tickets.