Opened 5 months ago

Closed 4 months ago

#2018 closed defect (fixed)

Significant amount of "zero size buf in writer t:1 r:0 f:0... 0000000000000000 0-0 while sending to client"

Reported by: sjon.hortensius.net@… Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.19.x
Keywords: Cc:
uname -a: Linux n0 5.5.8-arch1-1 #1 SMP PREEMPT Fri, 06 Mar 2020 00:57:33 +0000 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.19.1
built with OpenSSL 1.1.1g 21 Apr 2020
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --conf-path=/etc/nginx/nginx.conf --sbin-path=/usr/bin/nginx --pid-path=/run/nginx.pid --lock-path=/run/lock/nginx.lock --user=http --group=http --http-log-path=/var/log/nginx/access.log --error-log-path=stderr --http-client-body-temp-path=/var/lib/nginx/client-body --http-proxy-temp-path=/var/lib/nginx/proxy --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-cc-opt='-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -D_FORTIFY_SOURCE=2' --with-ld-opt=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now --with-compat --with-debug --with-file-aio --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_degradation_module --with-http_flv_module --with-http_geoip_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-pcre-jit --with-stream --with-stream_geoip_module --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-threads

Description

I have a machine which is behind cloudflare, which limits any variation in clients, but ever since upgrading to 1.19.1 I'm seeing a significant amount of errors "zero size buf in writer". It's so bad, I've decided to rollback to 1.19.0

I am unable to reproduce this consistently, it seems pretty random. Here are my significant config options:

	sendfile		on;
	tcp_nopush		on;
	tcp_nodelay		on;
	client_max_body_size 32k;

	keepalive_timeout 65;

	http2_push_preload on;

	gzip on;
	gzip_types text/css application/javascript text/xml application/octet-stream application/wasm;
	gzip_vary on;

	fastcgi_buffers				256 16k;
	fastcgi_max_temp_file_size	0;
	fastcgi_read_timeout		10s;

	ssl_session_timeout			1d;
	ssl_session_cache			shared:SSL:10m;
	ssl_session_tickets			off;

	ssl_protocols				TLSv1.2 TLSv1.3;
	ssl_ciphers					"ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:TLS-CHACHA20-POLY1305-SHA256:TLS-AES-256-GCM-SHA384:TLS-AES-128-GCM-SHA256";
	ssl_prefer_server_ciphers	on;

	ssl_ecdh_curve				secp384r1;

	# OCSP stapling
	ssl_stapling				on;
	ssl_stapling_verify			on;
	ssl_trusted_certificate		/etc/ssl/certs/ca-certificates.crt;
	resolver					8.8.8.8 8.8.4.4;

Change History (11)

comment:1 by sjon.hortensius.net@…, 5 months ago

the "component" for this ticket is obviously incorrect

comment:2 by Sergey Kandaurov, 5 months ago

Component: documentationnginx-core

comment:3 by Maxim Dounin, 5 months ago

Given the changes in nginx 1.19.1, most likely reason is a bug in changes to detect incorrect behaviour of upstream servers and an incorrect behaviour of your upstream server. Unfortunately, you've provided no information about your upstream servers, including no information about the protocol used by the upstream server from the alert message you are complaining about.

Full message from the error log might be helpful here. An excerpt from the error log configured to log at least at the "warn" level might be helpful as well.

Ideally, please provide a debug log of a request which shows the alert. Note thought that obtaining a debug log on a production server might require significant resources.

comment:4 by sjon.hortensius.net@…, 5 months ago

Thanks for the pointers. I did not realize this message was caused by an upstream instead of an http client, but I'm happy to help debugging this. First of all this is a machine with a single fastcgi (php-fpm) backend, connected through a UNIX socket.

The only other pointer I currently have is there was a weird error after restarting nginx where systemd complained:

nginx.service: Can't open PID file /run/nginx.pid (yet?) after start: Operation not permitted

which is weird as everything is done as root. I've configured debug-logging but I currently do not experience the same avalanche of errors I had previously. I'll let it run for a few days and update this bug-report if I have new information.

Last edited 5 months ago by sjon.hortensius.net@… (previous) (diff)

comment:5 by sjon.hortensius.net@…, 5 months ago

update; the error apparently occurs when a 304 response is parsed:

*14179 http fastcgi record length: 2495
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Status: 304 Not Modified"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Strict-Transport-Security: max-age=31536000; includeSubDomains; preload"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "X-Xss-Protection: 1; mode=block"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "X-Content-Type-Options: nosniff"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Feature-Policy: geolocation 'none'"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "X-Frame-Options: DENY"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Referrer-Policy: origin-when-cross-origin"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Content-Security-Policy: default-src 'none'; script-src 'self' cdn.jsdelivr.net blob:; manifest-src 'self'; frame-ancestors 'none'; worker-src 'self' cdn.jsdelivr.net blob:; connect-src 'self'; base-uri 'self'; form-action 'self'; img-s
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Link: </s/c.66954657.css>; rel=preload; as=style"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Link: </s/c.6dd23fa2.js>; rel=preload; as=script"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Link: <https://cdn.jsdelivr.net/gh/ajaxorg/ace-builds@1.4.12/src-min-noconflict/worker-php.js>; rel=preload; as=worker"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Link: <https://cdn.jsdelivr.net/gh/ajaxorg/ace-builds@1.4.12/src-min-noconflict/ace.js>; rel=preload; as=script"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Link: <https://cdn.jsdelivr.net/gh/ajaxorg/ace-builds@1.4.12/src-min-noconflict/ext-language_tools.js>; rel=preload; as=script"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Link: <https://cdn.jsdelivr.net/gh/ajaxorg/ace-builds@1.4.12/src-min-noconflict/mode-php.js>; rel=preload; as=script"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Link: <https://cdn.jsdelivr.net/gh/ajaxorg/ace-builds@1.4.12/src-min-noconflict/theme-chrome.js>; rel=preload; as=script"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Link: <https://cdn.jsdelivr.net/gh/ajaxorg/ace-builds@1.4.12/src-min-noconflict/theme-chaos.js>; rel=preload; as=script"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Last-Modified: Fri, 17 Jun 2016 18:43:36 GMT"
*14179 http fastcgi parser: 0
*14179 http fastcgi header: "Cache-Control: public, max-age=300"
*14179 http fastcgi parser: 1
*14179 http fastcgi header done
*14179 posix_memalign: 000055AC8E925810:4096 @16
*14179 HTTP/1.1 304 Not Modified
*14179 write new buf t:1 f:0 000055AC8E925830, pos 000055AC8E925830, size: 1538 file: 0, size: 0
*14179 http write filter: l:0 f:0 s:1538
*14179 http write filter limit 0
*14179 malloc: 000055AC8E4541E0:16384
*14179 SSL buf copy: 1538
*14179 http write filter 0000000000000000
*14179 http cacheable: 0
*14179 http file cache free, fd: -1
*14179 http upstream process upstream
*14179 pipe read upstream: 1
*14179 pipe preread: 1051
*14179 readv: eof:1, avail:0
*14179 readv: 1, last:1222
*14179 pipe recv chain: 0
*14179 pipe buf free s:0 t:1 f:0 000055AC8E4E91B0, pos 000055AC8E4E98CF, size: 1051 file: 0, size: 0
*14179 pipe length: -1
*14179 input buf #0 000055AC8E4E98CF
14179 upstream sent more data than specified in "Content-Length" header while reading upstream, client: xxx, server: xxx, request: "GET /xxx HTTP/1.1", upstream: "fastcgi://unix:/run/php-fpm/php-fpm.sock:", host: "xxx"
*14179 input buf 000055AC8E4E98CF 0
*14179 pipe write downstream: 1
*14179 pipe write downstream flush in
*14179 http output filter "/xxx"
*14179 http copy filter: "/xxx"
*14179 http postpone filter "/xxx" 000055AC8E8B38A8
*14179 write new buf t:1 f:0 000055AC8E4E91B0, pos 000055AC8E4E98CF, size: 0 file: 0, size: 0
*14179 zero size buf in writer t:1 r:0 f:0 000055AC8E4E91B0 000055AC8E4E98CF-000055AC8E4E98CF 0000000000000000 0-0 while sending to client, client: xxx, server: xxx, request: "GET /FWUea HTTP/1.1", upstream: "fastcgi://unix:/run/php-fpm/php-fpm.sock:"

the alert is prepended by a warning that seems incorrect, as the 304 does not contain a Content-Length header:

upstream sent more data than specified in "Content-Length" header while reading upstream

zero size buf in writer t:1 r:0 f:0 000055AC8E4E91B0 000055AC8E4E98CF-000055AC8E4E98CF 0000000000000000 0-0 while sending to client

What is also strange, but might be correct is that the http response header after the posix_memalign line contains a Windows enter M

Version 0, edited 5 months ago by sjon.hortensius.net@… (next)

comment:6 by Maxim Dounin, 5 months ago

Thanks, I was able to reproduce it.

The 304 status code implies no response body, so, while the warning uses not very correct words for your particular case, the upstream behaviour is still incorrect: it tries to send some response body, while it shouldn't. This is mostly identical to responses with "Content-Length: 0", followed by some unexpected response data.

There is a bug in handling of this incorrect behaviour of FastCGI upstream servers, which leads to an attempt to send an empty buffer, so the writer complains. The following patch should fix this:

# HG changeset patch
# User Maxim Dounin <mdounin@mdounin.ru>
# Date 1595378968 -10800
#      Wed Jul 22 03:49:28 2020 +0300
# Node ID c58d6f0cb56873aad57293befa1c24007e938bbf
# Parent  3b338dd8709933862c68b1a233af8bfa91ef75da
FastCGI: fixed zero size buf alerts on extra data (ticket #2018).

After 05e42236e95b (1.19.1) responses with extra data might result in
zero size buffers being generated and "zero size buf" alerts in writer
(if f->rest happened to be 0 when processing additional stdout data).

diff --git a/src/http/modules/ngx_http_fastcgi_module.c b/src/http/modules/ngx_http_fastcgi_module.c
--- a/src/http/modules/ngx_http_fastcgi_module.c
+++ b/src/http/modules/ngx_http_fastcgi_module.c
@@ -2306,6 +2306,18 @@ ngx_http_fastcgi_input_filter(ngx_event_
             break;
         }
 
+        if (f->rest == -2) {
+            f->rest = r->upstream->headers_in.content_length_n;
+        }
+
+        if (f->rest == 0) {
+            ngx_log_error(NGX_LOG_WARN, p->log, 0,
+                          "upstream sent more data than specified in "
+                          "\"Content-Length\" header");
+            p->upstream_done = 1;
+            break;
+        }
+
         cl = ngx_chain_get_free_buf(p->pool, &p->free);
         if (cl == NULL) {
             return NGX_ERROR;
@@ -2349,11 +2361,7 @@ ngx_http_fastcgi_input_filter(ngx_event_
             b->last = f->last;
         }
 
-        if (f->rest == -2) {
-            f->rest = r->upstream->headers_in.content_length_n;
-        }
-
-        if (f->rest >= 0) {
+        if (f->rest > 0) {
 
             if (b->last - b->pos > f->rest) {
                 ngx_log_error(NGX_LOG_WARN, p->log, 0,
@@ -2564,6 +2572,14 @@ ngx_http_fastcgi_non_buffered_filter(voi
             break;
         }
 
+        if (f->rest == 0) {
+            ngx_log_error(NGX_LOG_WARN, r->connection->log, 0,
+                          "upstream sent more data than specified in "
+                          "\"Content-Length\" header");
+            u->length = 0;
+            break;
+        }
+
         cl = ngx_chain_get_free_buf(r->pool, &u->free_bufs);
         if (cl == NULL) {
             return NGX_ERROR;
@@ -2594,7 +2610,7 @@ ngx_http_fastcgi_non_buffered_filter(voi
             b->last = f->last;
         }
 
-        if (f->rest >= 0) {
+        if (f->rest > 0) {
 
             if (b->last - b->pos > f->rest) {
                 ngx_log_error(NGX_LOG_WARN, r->connection->log, 0,

Note though, that while the patch fixes handling of incorrect responses of your FastCGI application in nginx, your application returns unexpected response data and needs to be fixed. Note well that in older nginx versions all the unexpected response data were actually sent to clients, and might cause various hard to diagnose problems.

comment:7 by sjon.hortensius.net@…, 4 months ago

Thanks - I've tried debugging this on the application side but I'm pretty sure no actual response-body is sent. How do you draw that conclusion from the debug-log? The socat output is somewhat obfuscated, but I don't see any data in there:

 43 61 63 68 65 2d 43 6f 6e 74 72 6f 6c 3a 20 70  Cache-Control: p
 75 62 6c 69 63 2c 20 6d 61 78 2d 61 67 65 3d 34  ublic, max-age=4
 35 0d 0a                                         5..
 0d 0a                                            ..
 00 00 00 00 00 00 01 03 00 01 00 08 00 00 00 00  ................

comment:8 by Maxim Dounin, 4 months ago

The FastCGI stdout record length is known to be 2.5k from the debug log snippet you've provided:

*14179 http fastcgi record length: 2495

On the other hand, it's about 1.5k headers parsed from the record. About 1k unexpected data remains, and that's why the warning is logged.

in reply to:  8 comment:9 by sjon.hortensius.net@…, 4 months ago

okay, I've switched the unix-socket to a tcp port to ease debugging and I can confirm there is indeed a bug in the application where it sends a response body after the 304 (without a Content-Length header). Thanks for the explanation!

comment:10 by Maxim Dounin <mdounin@…>, 4 months ago

In 7689:da8d758aabeb/nginx:

FastCGI: fixed zero size buf alerts on extra data (ticket #2018).

After 05e42236e95b (1.19.1) responses with extra data might result in
zero size buffers being generated and "zero size buf" alerts in writer
(if f->rest happened to be 0 when processing additional stdout data).

comment:11 by Maxim Dounin, 4 months ago

Resolution: fixed
Status: newclosed

Fix committed, thanks for reporting this.

Note: See TracTickets for help on using tickets.