Opened 4 years ago

Closed 3 months ago

#1168 closed defect (fixed)

Nginx не корректно обрабатывает опцию max_size в директиве proxy_cache_path

Reported by: POPSuL@… Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.11.x
Keywords: cache, max_size, nfs, bsize Cc: popsul1993@…
uname -a: Linux *** 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.11.7
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC)
built with OpenSSL 1.0.1e-fips 11 Feb 2013
TLS SNI support enabled
configure arguments: --user=nginx --group=nginx --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --http-scgi-temp-path=/var/lib/nginx/tmp/scgi --pid-path=/var/run/nginx.pid --lock-path=/var/lock/subsys/nginx --with-debug --with-http_ssl_module --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module --with-http_image_filter_module --with-http_geoip_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_auth_request_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_stub_status_module --with-http_perl_module --with-poll_module --with-select_module --without-mail_smtp_module --without-mail_imap_module --without-mail_pop3_module --with-ipv6 --with-file-aio --with-pcre-jit --with-threads --add-module=/root/rpmbuild/BUILD/nginx-1.11.7/nginx-upload-progress-module --add-module=/root/rpmbuild/BUILD/nginx-1.11.7/mod_zip-1.1.6 --add-module=/root/rpmbuild/BUILD/nginx-1.11.7/ngx_http_auth_pam_module-1.2 --add-module=/root/rpmbuild/BUILD/nginx-1.11.7/ngx_upstream_status-master --add-module=/root/rpmbuild/BUILD/nginx-1.11.7/ngx_http_geoip2_module --add-module=/root/rpmbuild/BUILD/nginx-1.11.7/headers-more-nginx-module --add-module=/root/rpmbuild/BUILD/nginx-1.11.7/ngx_cache_purge --add-module=/root/rpmbuild/BUILD/nginx-1.11.7/ngx_cache_status_module --with-debug --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic'

Description

Например, есть конфигурация с директивой вида:

proxy_cache_path /var/lib/nginx/cache levels=1:2 keys_zone=images:64m inactive=7d max_size=12g;

где /var/lib/nginx/cache является примонтированной по NFS директорией. Директория монтируется с флагами

rsize=1048576 wsize=1048576

Было замечено что nginx поддерживает количество файлов в кэше на уровне 12.5к, при том что 12 тысяч файлов (превьюшки картинок) для 12g слишком мало.

Изучив проблему более детально стало ясно что по факту размер кэша равен 12g/bsize, bsize извлекается для /var/lib/nginx/cache/... с помощью statfs и равен значениям из rsize/wsize (https://github.com/nginx/nginx/blob/master/src/http/ngx_http_file_cache.c#L154). То есть 12884901888/1048576=12288.

Когда количество файлов в кэше достигает значения 12288 - начинается принудительная инвалидация (https://github.com/nginx/nginx/blob/master/src/http/ngx_http_file_cache.c#L1950)

В NFS bsize вычисляется исходя из параметров rsize и wsize - нельзя полагаться на него при расчёте максимального размера файлов в кэше так как rsize и wsize имеют значение только для сетевого стека, и никак не отображают параметры физического хранилища.

Варианты решения проблемы:

  1. Использовать константный bsize размером 512/4096/8192;
  2. Сделать возможность явно указывать bsize с помощью дополнительного параметра;

Также не плохо было бы указать эту "особенность" в документации.

Change History (5)

comment:1 by Maxim Dounin, 4 years ago

Status: newaccepted

Судя по всему, statfs() на линуксе не отличается логической корректностью. Хотя поле f_bsize и присутствует, документация говорит, что:

               __fsword_t f_bsize;   /* Optimal transfer block size */

вместо традиционного

     uint64_t f_bsize;               /* filesystem fragment size */

на BSD-системах, где соответствующий системный вызов появился. Т.е. фактически f_bsize используется вместо f_iosize.

Вероятно, правильным решением будет перестать использовать statfs() на линуксе.

comment:2 by POPSuL@…, 3 years ago

Планируются ли какие-нибудь фиксы по этому багу?

comment:3 by Maxim Dounin, 4 months ago

For the record, tested the details.

On Linux, NFS client mounts report wsize mount option both in f_bsize and f_frsize, in both statfs() and statvfs() (no difference). The only solution seems to avoid rounding to f_bsize, at least when there are reasons to assume we are using NFS on Linux.

Note that rounding to f_bsize is not really needed since 3899:e7cd13b7f759 (1.0.1) on Unix systems, since we use fstat() st_blocks instead. Rounding to block size is mostly present for Windows, where there is no equivalent to fstat() st_blocks.

Looking into blocksizes on various filesystems on Linux suggests that most of the filesystems (at least ext2, ext3, ext4 and XFS) can only use block sizes below pagesize (4096 bytes on most platforms). On the other hand, FAT can use block sizes up to at least 512k, but probably this isn't something we care about (and st_blocks will lead to good enough results anyway).

On other OSes block sizes at least up to 64k are known to be valid. In particular, UFS on FreeBSD works fine with blocksize and fragsize set to 64k, and such setups are known to be used in practice. Using larger block sizes requires changing MAXBSIZE and MAXPHYS. The same applies to OpenBSD and NetBSD.

For the record, NFS on FreeBSD reports block size 512 bytes. The same applies to ZFS.

comment:4 by Maxim Dounin <mdounin@…>, 3 months ago

In 7668:0a04e5e4c40b/nginx:

Large block sizes on Linux are now ignored (ticket #1168).

NFS on Linux is known to report wsize as a block size (in both f_bsize
and f_frsize, both in statfs() and statvfs()). On the other hand,
typical file system block sizes on Linux (ext2/ext3/ext4, XFS) are limited
to pagesize. (With FAT, block sizes can be at least up to 512k in
extreme cases, but this doesn't really matter, see below.)
To avoid too aggressive cache clearing on NFS volumes on Linux, block
sizes larger than pagesize are now ignored.

Note that it is safe to ignore large block sizes. Since 3899:e7cd13b7f759
(1.0.1) cache size is calculated based on fstat() st_blocks, and rounding
to file system block size is preserved mostly for Windows.

Note well that on other OSes valid block sizes seen are at least up
to 65536. In particular, UFS on FreeBSD is known to work well with block
and fragment sizes set to 65536.

comment:5 by Maxim Dounin, 3 months ago

Resolution: fixed
Status: acceptedclosed

Should be fixed now.

Note though that it is generally a bad idea to store cache on NFS volumes, much like serving static files from NFS volumes, as access to files on an NFS file system can take a while, blocking nginx worker process from serving other requests.

Note: See TracTickets for help on using tickets.