Opened 4 years ago
Closed 4 years ago
#2018 closed defect (fixed)
Significant amount of "zero size buf in writer t:1 r:0 f:0... 0000000000000000 0-0 while sending to client"
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | nginx-core | Version: | 1.19.x |
Keywords: | Cc: | ||
uname -a: | Linux n0 5.5.8-arch1-1 #1 SMP PREEMPT Fri, 06 Mar 2020 00:57:33 +0000 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.19.1
built with OpenSSL 1.1.1g 21 Apr 2020 TLS SNI support enabled configure arguments: --prefix=/etc/nginx --conf-path=/etc/nginx/nginx.conf --sbin-path=/usr/bin/nginx --pid-path=/run/nginx.pid --lock-path=/run/lock/nginx.lock --user=http --group=http --http-log-path=/var/log/nginx/access.log --error-log-path=stderr --http-client-body-temp-path=/var/lib/nginx/client-body --http-proxy-temp-path=/var/lib/nginx/proxy --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-cc-opt='-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -D_FORTIFY_SOURCE=2' --with-ld-opt=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now --with-compat --with-debug --with-file-aio --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_degradation_module --with-http_flv_module --with-http_geoip_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-pcre-jit --with-stream --with-stream_geoip_module --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-threads |
Description
I have a machine which is behind cloudflare, which limits any variation in clients, but ever since upgrading to 1.19.1 I'm seeing a significant amount of errors "zero size buf in writer". It's so bad, I've decided to rollback to 1.19.0
I am unable to reproduce this consistently, it seems pretty random. Here are my significant config options:
sendfile on; tcp_nopush on; tcp_nodelay on; client_max_body_size 32k; keepalive_timeout 65; http2_push_preload on; gzip on; gzip_types text/css application/javascript text/xml application/octet-stream application/wasm; gzip_vary on; fastcgi_buffers 256 16k; fastcgi_max_temp_file_size 0; fastcgi_read_timeout 10s; ssl_session_timeout 1d; ssl_session_cache shared:SSL:10m; ssl_session_tickets off; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:TLS-CHACHA20-POLY1305-SHA256:TLS-AES-256-GCM-SHA384:TLS-AES-128-GCM-SHA256"; ssl_prefer_server_ciphers on; ssl_ecdh_curve secp384r1; # OCSP stapling ssl_stapling on; ssl_stapling_verify on; ssl_trusted_certificate /etc/ssl/certs/ca-certificates.crt; resolver 8.8.8.8 8.8.4.4;
Change History (11)
comment:1 by , 4 years ago
comment:2 by , 4 years ago
Component: | documentation → nginx-core |
---|
comment:3 by , 4 years ago
Given the changes in nginx 1.19.1, most likely reason is a bug in changes to detect incorrect behaviour of upstream servers and an incorrect behaviour of your upstream server. Unfortunately, you've provided no information about your upstream servers, including no information about the protocol used by the upstream server from the alert message you are complaining about.
Full message from the error log might be helpful here. An excerpt from the error log configured to log at least at the "warn" level might be helpful as well.
Ideally, please provide a debug log of a request which shows the alert. Note thought that obtaining a debug log on a production server might require significant resources.
comment:4 by , 4 years ago
Thanks for the pointers. I did not realize this message was caused by an upstream instead of an http client, but I'm happy to help debugging this. First of all this is a machine with a single fastcgi (php-fpm) backend, connected through a UNIX socket.
The only other pointer I currently have is there was a weird error after restarting nginx where systemd complained:
nginx.service: Can't open PID file /run/nginx.pid (yet?) after start: Operation not permitted
which is weird as everything is done as root. I've configured debug-logging but I currently do not experience the same avalanche of errors I had previously. I'll let it run for a few days and update this bug-report if I have new information.
comment:5 by , 4 years ago
update; the error apparently occurs when a 304 response is parsed:
*14179 http fastcgi record length: 2495 *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Status: 304 Not Modified" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Strict-Transport-Security: max-age=31536000; includeSubDomains; preload" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "X-Xss-Protection: 1; mode=block" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "X-Content-Type-Options: nosniff" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Feature-Policy: geolocation 'none'" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "X-Frame-Options: DENY" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Referrer-Policy: origin-when-cross-origin" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Content-Security-Policy: default-src 'none'; script-src 'self' cdn.jsdelivr.net blob:; manifest-src 'self'; frame-ancestors 'none'; worker-src 'self' cdn.jsdelivr.net blob:; connect-src 'self'; base-uri 'self'; form-action 'self'; img-s *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Link: </s/c.66954657.css>; rel=preload; as=style" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Link: </s/c.6dd23fa2.js>; rel=preload; as=script" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Link: <https://cdn.jsdelivr.net/gh/ajaxorg/ace-builds@1.4.12/src-min-noconflict/worker-php.js>; rel=preload; as=worker" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Link: <https://cdn.jsdelivr.net/gh/ajaxorg/ace-builds@1.4.12/src-min-noconflict/ace.js>; rel=preload; as=script" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Link: <https://cdn.jsdelivr.net/gh/ajaxorg/ace-builds@1.4.12/src-min-noconflict/ext-language_tools.js>; rel=preload; as=script" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Link: <https://cdn.jsdelivr.net/gh/ajaxorg/ace-builds@1.4.12/src-min-noconflict/mode-php.js>; rel=preload; as=script" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Link: <https://cdn.jsdelivr.net/gh/ajaxorg/ace-builds@1.4.12/src-min-noconflict/theme-chrome.js>; rel=preload; as=script" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Link: <https://cdn.jsdelivr.net/gh/ajaxorg/ace-builds@1.4.12/src-min-noconflict/theme-chaos.js>; rel=preload; as=script" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Last-Modified: Fri, 17 Jun 2016 18:43:36 GMT" *14179 http fastcgi parser: 0 *14179 http fastcgi header: "Cache-Control: public, max-age=300" *14179 http fastcgi parser: 1 *14179 http fastcgi header done *14179 posix_memalign: 000055AC8E925810:4096 @16 *14179 HTTP/1.1 304 Not Modified *14179 write new buf t:1 f:0 000055AC8E925830, pos 000055AC8E925830, size: 1538 file: 0, size: 0 *14179 http write filter: l:0 f:0 s:1538 *14179 http write filter limit 0 *14179 malloc: 000055AC8E4541E0:16384 *14179 SSL buf copy: 1538 *14179 http write filter 0000000000000000 *14179 http cacheable: 0 *14179 http file cache free, fd: -1 *14179 http upstream process upstream *14179 pipe read upstream: 1 *14179 pipe preread: 1051 *14179 readv: eof:1, avail:0 *14179 readv: 1, last:1222 *14179 pipe recv chain: 0 *14179 pipe buf free s:0 t:1 f:0 000055AC8E4E91B0, pos 000055AC8E4E98CF, size: 1051 file: 0, size: 0 *14179 pipe length: -1 *14179 input buf #0 000055AC8E4E98CF 14179 upstream sent more data than specified in "Content-Length" header while reading upstream, client: xxx, server: xxx, request: "GET /xxx HTTP/1.1", upstream: "fastcgi://unix:/run/php-fpm/php-fpm.sock:", host: "xxx" *14179 input buf 000055AC8E4E98CF 0 *14179 pipe write downstream: 1 *14179 pipe write downstream flush in *14179 http output filter "/xxx" *14179 http copy filter: "/xxx" *14179 http postpone filter "/xxx" 000055AC8E8B38A8 *14179 write new buf t:1 f:0 000055AC8E4E91B0, pos 000055AC8E4E98CF, size: 0 file: 0, size: 0 *14179 zero size buf in writer t:1 r:0 f:0 000055AC8E4E91B0 000055AC8E4E98CF-000055AC8E4E98CF 0000000000000000 0-0 while sending to client, client: xxx, server: xxx, request: "GET /FWUea HTTP/1.1", upstream: "fastcgi://unix:/run/php-fpm/php-fpm.sock:"
the alert is prepended by a warning that seems incorrect, as the 304 does not contain a Content-Length header:
upstream sent more data than specified in "Content-Length" header while reading upstream
zero size buf in writer t:1 r:0 f:0 000055AC8E4E91B0 000055AC8E4E98CF-000055AC8E4E98CF 0000000000000000 0-0 while sending to client
What is also strange, but might be correct is that the http response header after the posix_memalign line contains a Windows enter ^M
comment:6 by , 4 years ago
Thanks, I was able to reproduce it.
The 304 status code implies no response body, so, while the warning uses not very correct words for your particular case, the upstream behaviour is still incorrect: it tries to send some response body, while it shouldn't. This is mostly identical to responses with "Content-Length: 0", followed by some unexpected response data.
There is a bug in handling of this incorrect behaviour of FastCGI upstream servers, which leads to an attempt to send an empty buffer, so the writer complains. The following patch should fix this:
# HG changeset patch # User Maxim Dounin <mdounin@mdounin.ru> # Date 1595378968 -10800 # Wed Jul 22 03:49:28 2020 +0300 # Node ID c58d6f0cb56873aad57293befa1c24007e938bbf # Parent 3b338dd8709933862c68b1a233af8bfa91ef75da FastCGI: fixed zero size buf alerts on extra data (ticket #2018). After 05e42236e95b (1.19.1) responses with extra data might result in zero size buffers being generated and "zero size buf" alerts in writer (if f->rest happened to be 0 when processing additional stdout data). diff --git a/src/http/modules/ngx_http_fastcgi_module.c b/src/http/modules/ngx_http_fastcgi_module.c --- a/src/http/modules/ngx_http_fastcgi_module.c +++ b/src/http/modules/ngx_http_fastcgi_module.c @@ -2306,6 +2306,18 @@ ngx_http_fastcgi_input_filter(ngx_event_ break; } + if (f->rest == -2) { + f->rest = r->upstream->headers_in.content_length_n; + } + + if (f->rest == 0) { + ngx_log_error(NGX_LOG_WARN, p->log, 0, + "upstream sent more data than specified in " + "\"Content-Length\" header"); + p->upstream_done = 1; + break; + } + cl = ngx_chain_get_free_buf(p->pool, &p->free); if (cl == NULL) { return NGX_ERROR; @@ -2349,11 +2361,7 @@ ngx_http_fastcgi_input_filter(ngx_event_ b->last = f->last; } - if (f->rest == -2) { - f->rest = r->upstream->headers_in.content_length_n; - } - - if (f->rest >= 0) { + if (f->rest > 0) { if (b->last - b->pos > f->rest) { ngx_log_error(NGX_LOG_WARN, p->log, 0, @@ -2564,6 +2572,14 @@ ngx_http_fastcgi_non_buffered_filter(voi break; } + if (f->rest == 0) { + ngx_log_error(NGX_LOG_WARN, r->connection->log, 0, + "upstream sent more data than specified in " + "\"Content-Length\" header"); + u->length = 0; + break; + } + cl = ngx_chain_get_free_buf(r->pool, &u->free_bufs); if (cl == NULL) { return NGX_ERROR; @@ -2594,7 +2610,7 @@ ngx_http_fastcgi_non_buffered_filter(voi b->last = f->last; } - if (f->rest >= 0) { + if (f->rest > 0) { if (b->last - b->pos > f->rest) { ngx_log_error(NGX_LOG_WARN, r->connection->log, 0,
Note though, that while the patch fixes handling of incorrect responses of your FastCGI application in nginx, your application returns unexpected response data and needs to be fixed. Note well that in older nginx versions all the unexpected response data were actually sent to clients, and might cause various hard to diagnose problems.
comment:7 by , 4 years ago
Thanks - I've tried debugging this on the application side but I'm pretty sure no actual response-body is sent. How do you draw that conclusion from the debug-log? The socat output is somewhat obfuscated, but I don't see any data in there:
43 61 63 68 65 2d 43 6f 6e 74 72 6f 6c 3a 20 70 Cache-Control: p 75 62 6c 69 63 2c 20 6d 61 78 2d 61 67 65 3d 34 ublic, max-age=4 35 0d 0a 5.. 0d 0a .. 00 00 00 00 00 00 01 03 00 01 00 08 00 00 00 00 ................
follow-up: 9 comment:8 by , 4 years ago
The FastCGI stdout record length is known to be 2.5k from the debug log snippet you've provided:
*14179 http fastcgi record length: 2495
On the other hand, it's about 1.5k headers parsed from the record. About 1k unexpected data remains, and that's why the warning is logged.
comment:9 by , 4 years ago
okay, I've switched the unix-socket to a tcp port to ease debugging and I can confirm there is indeed a bug in the application where it sends a response body after the 304 (without a Content-Length header). Thanks for the explanation!
comment:11 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fix committed, thanks for reporting this.
the "component" for this ticket is obviously incorrect