Subrequest using slice stuck in infinite loop
|Reported by:||Owned by:|
|uname -a:||Linux hostname 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux|
nginx version: nginx/1.11.10
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC)
configure arguments: --prefix=/usr/local/nginx --sbin-path=/usr/local/nginx/sbin/nginx --conf-path=/usr/local/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --pid-path=/run/nginx.pid --lock-path=/run/lock/subsys/nginx --user=nginx --group=nginx --with-pcre --with-http_slice_module --with-threads --with-debug --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' --with-ld-opt='-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-E'
When nginx has to do subrequests for one or more slices to a remote origin, and the config has a error_page pointing to a different location than the one serving slices and the error is cached by proxy_cache_valid, a subrequest can get stuck in an infinite loop (i.e. worker process always running at 100% cpu) if the origin server stops responding and nginx internally generates an 5xx error, e.g 502 (such 5xx error would then be redirected to the error specific location, which is cached or will be cached - in case the cache files are empty).
If proxy_cache_valid 5xx (e.g 502) is set to 0s, then such loop does not occur, and the subrequest keeps trying to connect to the origin until it comes back online and the whole request (subsequent subrequests) are successfully satisfied.
If error_page 502 /another_uri_for_erros/ is not set (or said custom location is not present in the server level), the loop also does not occur.
This also happens with nginx-1.10.1 (although I have not reproduced it with the vanilla version, it seems not to be related to 3rd party modules).
Also, I have not tested this bug with other http statuses, such as 4xx (don't know if they're treated in a different manner internally by the subrequest module).
Attached are two files. One is the minimal nginx config file that I used to reproduce this bug with the vanilla nginx-1.11.10 (as per -V). The second one has a gdb backtrace inside the inner most part of the loop.
Hope that it helps, and, if needed, I'll be glad to provide more debugging infos or testing possible patches.