Opened 5 years ago

Closed 3 years ago

#1659 closed enhancement (fixed)

nginx intermittent delay before sending response

Reported by: Ian Marsh Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.12.x
Keywords: uwsgi scgi Cc:
uname -a: Linux webserver 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.12.2
built with OpenSSL 1.0.1f 6 Jan 2014
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-stream_ssl_preread_module --with-mail=dynamic --with-mail_ssl_module --add-dynamic-module=/build/nginx-WaTJd7/nginx-1.12.2/debian/modules/nginx-auth-pam --add-dynamic-module=/build/nginx-WaTJd7/nginx-1.12.2/debian/modules/nginx-dav-ext-module --add-dynamic-module=/build/nginx-WaTJd7/nginx-1.12.2/debian/modules/nginx-echo --add-dynamic-module=/build/nginx-WaTJd7/nginx-1.12.2/debian/modules/nginx-upstream-fair --add-dynamic-module=/build/nginx-WaTJd7/nginx-1.12.2/debian/modules/ngx_http_substitutions_filter_module

Description

Hi,

We have a Flask Python application served by uWSGI through nginx using 'uwsgi-pass' via a load-balancer, hosted in AWS.

We are noticing that nginx intermittently (but frequently) seems to add a substantial delay (as much as 200 seconds, but more typically 5-10 seconds) to some requests.

I've spent a couple of weeks looking into this, with activities ranging from changes to logging and analysis of logs, configuration changes, packet traces, nginx debug logs and strace, and more, but have been unable to solve the problem.

This is a production service, so I'm limited to what I can change (particularly around uWSGI and the app), and I've been unable to replicate the issue on a test setup so far, presumably because the incoming requests and load aren't "realistic enough".

Our production servers have:

$ nginx -V
nginx version: nginx/1.4.6 (Ubuntu)
built by gcc 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_dav_module --with-http_geoip_module --with-http_gzip_static_module --with-http_image_filter_module --with-http_spdy_module --with-http_sub_module --with-http_xslt_module --with-mail --with-mail_ssl_module
$ uname -a
Linux webserver 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ uwsgi --version
2.0.15

I did try upgrading nginx on one instance and passed some traffic through it (no improvement), it had:

$ nginx -V
nginx version: nginx/1.12.2
built with OpenSSL 1.0.1f 6 Jan 2014
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-stream_ssl_preread_module --with-mail=dynamic --with-mail_ssl_module --add-dynamic-module=/build/nginx-WaTJd7/nginx-1.12.2/debian/modules/nginx-auth-pam --add-dynamic-module=/build/nginx-WaTJd7/nginx-1.12.2/debian/modules/nginx-dav-ext-module --add-dynamic-module=/build/nginx-WaTJd7/nginx-1.12.2/debian/modules/nginx-echo --add-dynamic-module=/build/nginx-WaTJd7/nginx-1.12.2/debian/modules/nginx-upstream-fair --add-dynamic-module=/build/nginx-WaTJd7/nginx-1.12.2/debian/modules/ngx_http_substitutions_filter_module

I also tried a standalone nginx instance and pointed it at the same uWSGI across the network (still no improvement):

$ nginx -V
nginx version: nginx/1.14.0 (Ubuntu)
built with OpenSSL 1.1.0g  2 Nov 2017
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/build/nginx-mcUg8N/nginx-1.14.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module
$ uname -a
Linux webserver 4.15.0-1019-aws #19-Ubuntu SMP Fri Aug 10 09:34:35 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Config is pretty standard and I've tried enabling/disabling various bits with no improvement; key statements are:

events {
    use epoll;
}
http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
}

Here's log evidence of the problem (bits {LIKE_THIS} are redacted):

uWSGI:
[pid: 3923|app: -|req: -/-] {LB_IP} (-) {46 vars in 922 bytes} [Fri Oct 19 13:26:23 2018] POST {URL} => generated 82 bytes in 2474 msecs (HTTP/1.1 200) 4 headers in 253 bytes (1 switches on core 4) 1539955583.879-4615-438-18-{CLIENT_IP}

nginx:
{LB_IP} - - [19/Oct/2018:13:26:43 +0000] "POST {URL} HTTP/1.1" 200 82 "-" "-" 19.534 1539955583.879-4615-438-18-{CLIENT_IP}

The "1539955583.879-4615-438-18-{CLIENT_IP}" is an HTTP_X_TRACEID header I added with nginx and is passed to uWSGI and logged by both, so I can always tie the nginx log entry to the relevant uWSGI entry. It also contains the nginx PID and connection ID which is handy for finding related debug entries; see below.

Note the timings; uWSGI logs as 13:26:23 (which is when the request was received, not completed) and took 2474ms to generate the response, whereas nginx logs as 13:26:43 (when the response was delivered).

Before anyone assumes this is client-side delay (between browser and nginx), note that I have also included the $upstream_response_time in the nginx log, which for this request was 19534ms, 17s longer than uWSGI claims the response took.

We normally use a UNIX socket for the nginx <-> uWSGI communication, but I changed one of our servers to use TCP (port 8001); firstly to see if it made a difference - it did not - but also to then use tcpdump to capture what was happening between the two processes.

I saw this:

13:26:23.929522 IP 127.0.0.1.48880 > 127.0.0.1.8001: Flags [S], seq 2215864258, win 43690, options [mss 65495,sackOK,TS val 413969 ecr 0,nop,wscale 7], length 0
13:26:23.929528 IP 127.0.0.1.8001 > 127.0.0.1.48880: Flags [S.], seq 3951451851, ack 2215864259, win 43690, options [mss 65495,sackOK,TS val 413969 ecr 413969,nop,wscale 7], length 0
13:26:23.929537 IP 127.0.0.1.48880 > 127.0.0.1.8001: Flags [.], ack 1, win 342, options [nop,nop,TS val 413969 ecr 413969], length 0
13:26:23.931409 IP 127.0.0.1.48880 > 127.0.0.1.8001: Flags [P.], seq 1:1991, ack 1, win 342, options [nop,nop,TS val 413970 ecr 413969], length 1990
13:26:23.931424 IP 127.0.0.1.8001 > 127.0.0.1.48880: Flags [.], ack 1991, win 1365, options [nop,nop,TS val 413970 ecr 413970], length 0
13:26:26.406311 IP 127.0.0.1.8001 > 127.0.0.1.48880: Flags [P.], seq 1:336, ack 1991, win 1365, options [nop,nop,TS val 414589 ecr 413970], length 335
13:26:26.406335 IP 127.0.0.1.48880 > 127.0.0.1.8001: Flags [.], ack 336, win 350, options [nop,nop,TS val 414589 ecr 414589], length 0
13:26:43.456382 IP 127.0.0.1.8001 > 127.0.0.1.48880: Flags [F.], seq 336, ack 1991, win 1365, options [nop,nop,TS val 418851 ecr 414589], length 0
13:26:43.459119 IP 127.0.0.1.48880 > 127.0.0.1.8001: Flags [F.], seq 1991, ack 337, win 350, options [nop,nop,TS val 418852 ecr 418851], length 0
13:26:43.459135 IP 127.0.0.1.8001 > 127.0.0.1.48880: Flags [.], ack 1992, win 1365, options [nop,nop,TS val 418852 ecr 418852], length 0

So, the 335 byte response (253 bytes of headers + 82 bytes of data = 335 bytes payload) was sent (and ACK'd) at 13:26:26 (as per uWSGI's claim of 2474ms).

However, nginx doesn't appear to send the response until 13:26:43, when uWSGI closes the connection (FIN/ACK packet); this matches with nginx's $upstream_response_time of 19534ms.

By the way, the response did contain a "Content-Length: 82" header.

So, on to debug logs (sorry, bit long, and again with {REDACTED_BITS}):

2018/10/19 13:26:26 [debug] 4615#4615: *438 http upstream request: "{URL}"
2018/10/19 13:26:26 [debug] 4615#4615: *438 http upstream process header
2018/10/19 13:26:26 [debug] 4615#4615: *438 malloc: 00007F2FD2F10000:4096
2018/10/19 13:26:26 [debug] 4615#4615: *438 recv: eof:0, avail:1
2018/10/19 13:26:26 [debug] 4615#4615: *438 recv: fd:21 335 of 4096
2018/10/19 13:26:26 [debug] 4615#4615: *438 http uwsgi status 200 "200 OK"
2018/10/19 13:26:26 [debug] 4615#4615: *438 http uwsgi header: "Content-Type: application/json"
2018/10/19 13:26:26 [debug] 4615#4615: *438 http uwsgi header: "Content-Length: 82"
2018/10/19 13:26:26 [debug] 4615#4615: *438 http uwsgi header: "Cache-Control: no-cache"
2018/10/19 13:26:26 [debug] 4615#4615: *438 http uwsgi header: "Set-Cookie: session={SESSION_ID}; HttpOnly; Path=/"
2018/10/19 13:26:26 [debug] 4615#4615: *438 http uwsgi header done
2018/10/19 13:26:26 [debug] 4615#4615: *438 HTTP/1.1 200 OK
Server: nginx/1.12.2
Date: Fri, 19 Oct 2018 13:26:26 GMT
Content-Type: application/json
Content-Length: 82
Connection: keep-alive
Keep-Alive: timeout=120
Cache-Control: no-cache
Set-Cookie: session={SESSION_ID}; HttpOnly; Path=/
Accept-Ranges: bytes

2018/10/19 13:26:26 [debug] 4615#4615: *438 write new buf t:1 f:0 00007F2FD2F2F800, pos 00007F2FD2F2F800, size: 383 file: 0, size: 0
2018/10/19 13:26:26 [debug] 4615#4615: *438 http write filter: l:0 f:0 s:383
2018/10/19 13:26:26 [debug] 4615#4615: *438 http cacheable: 0
2018/10/19 13:26:26 [debug] 4615#4615: *438 http upstream process upstream
2018/10/19 13:26:26 [debug] 4615#4615: *438 pipe read upstream: 0
2018/10/19 13:26:26 [debug] 4615#4615: *438 pipe preread: 82
2018/10/19 13:26:26 [debug] 4615#4615: *438 pipe buf free s:0 t:1 f:0 00007F2FD2F10000, pos 00007F2FD2F100FD, size: 82 file: 0, size: 0
2018/10/19 13:26:26 [debug] 4615#4615: *438 pipe length: -1
2018/10/19 13:26:26 [debug] 4615#4615: *438 pipe write downstream: 1
2018/10/19 13:26:26 [debug] 4615#4615: *438 pipe write busy: 0
2018/10/19 13:26:26 [debug] 4615#4615: *438 pipe write: out:0000000000000000, f:0
2018/10/19 13:26:26 [debug] 4615#4615: *438 pipe read upstream: 0
2018/10/19 13:26:26 [debug] 4615#4615: *438 pipe buf free s:0 t:1 f:0 00007F2FD2F10000, pos 00007F2FD2F100FD, size: 82 file: 0, size: 0
2018/10/19 13:26:26 [debug] 4615#4615: *438 pipe length: -1
2018/10/19 13:26:26 [debug] 4615#4615: *438 event timer del: 21: 1539959183930
2018/10/19 13:26:26 [debug] 4615#4615: *438 event timer add: 21: 3600000:1539959186406
2018/10/19 13:26:26 [debug] 4615#4615: *438 http upstream request: "{URL}"
2018/10/19 13:26:26 [debug] 4615#4615: *438 http upstream dummy handler
[...other stuff happens here on other connections...]
2018/10/19 13:26:43 [debug] 4615#4615: *438 http upstream request: "{URL}"
2018/10/19 13:26:43 [debug] 4615#4615: *438 http upstream process upstream
2018/10/19 13:26:43 [debug] 4615#4615: *438 pipe read upstream: 1
2018/10/19 13:26:43 [debug] 4615#4615: *438 readv: eof:1, avail:1
2018/10/19 13:26:43 [debug] 4615#4615: *438 readv: 1, last:3761
2018/10/19 13:26:43 [debug] 4615#4615: *438 pipe recv chain: 0
2018/10/19 13:26:43 [debug] 4615#4615: *438 pipe buf free s:0 t:1 f:0 00007F2FD2F10000, pos 00007F2FD2F100FD, size: 82 file: 0, size: 0
2018/10/19 13:26:43 [debug] 4615#4615: *438 pipe length: -1
2018/10/19 13:26:43 [debug] 4615#4615: *438 input buf #0
2018/10/19 13:26:43 [debug] 4615#4615: *438 pipe write downstream: 1
2018/10/19 13:26:43 [debug] 4615#4615: *438 pipe write downstream flush in
2018/10/19 13:26:43 [debug] 4615#4615: *438 http output filter "{URL}"
2018/10/19 13:26:43 [debug] 4615#4615: *438 http copy filter: "{URL}"
2018/10/19 13:26:43 [debug] 4615#4615: *438 http postpone filter "{URL}" 00007F2FD2F2F4D0
2018/10/19 13:26:43 [debug] 4615#4615: *438 write old buf t:1 f:0 00007F2FD2F2F800, pos 00007F2FD2F2F800, size: 383 file: 0, size: 0
2018/10/19 13:26:43 [debug] 4615#4615: *438 write new buf t:1 f:0 00007F2FD2F10000, pos 00007F2FD2F100FD, size: 82 file: 0, size: 0
2018/10/19 13:26:43 [debug] 4615#4615: *438 http write filter: l:0 f:0 s:465
2018/10/19 13:26:43 [debug] 4615#4615: *438 http copy filter: 0 "{URL}"
2018/10/19 13:26:43 [debug] 4615#4615: *438 pipe write downstream done
2018/10/19 13:26:43 [debug] 4615#4615: *438 event timer del: 21: 1539959186406
2018/10/19 13:26:43 [debug] 4615#4615: *438 event timer add: 21: 3600000:1539959203456
2018/10/19 13:26:43 [debug] 4615#4615: *438 http upstream exit: 0000000000000000
2018/10/19 13:26:43 [debug] 4615#4615: *438 finalize http upstream request: 0
2018/10/19 13:26:43 [debug] 4615#4615: *438 finalize http uwsgi request
2018/10/19 13:26:43 [debug] 4615#4615: *438 free rr peer 1 0
2018/10/19 13:26:43 [debug] 4615#4615: *438 close http upstream connection: 21
etc.

The relevant bits of strace...

4615  13:26:26.151042 epoll_wait(72,  <unfinished ...>
4615  13:26:26.406395 <... epoll_wait resumed> {{EPOLLIN|EPOLLOUT, {u32=3521545648, u64=139843361736112}}}, 512, 97723) = 1
4615  13:26:26.406855 recvfrom(21, "HTTP/1.1 200 OK\r\nContent-Type: a"..., 4096, 0, NULL, NULL) = 335
4615  13:26:26.409079 epoll_wait(72, {{EPOLLIN|EPOLLOUT, {u32=3521545889, u64=139843361736353}}}, 512, 97467) = 1
[...lots of other unrelated activity including a lot of epoll_wait-ing...]
4615  13:26:41.180539 epoll_wait(72, {{EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=3521545648, u64=139843361736112}}}, 512, 102058) = 1
4615  13:26:43.457042 readv(21, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3761}], 1) = 0
4615  13:26:43.459098 close(21)         = 0

At this point, I'm out of ideas; for some reason nginx appears to read all the data it needs promptly, but won't respond to the end user until uWSGI closes the connection.

Any thoughts, suggestions, and explanations welcome!

Change History (5)

comment:1 by Maxim Dounin, 5 years ago

Keywords: uwsgi scgi added
Status: newaccepted
Type: defectenhancement

In the uwsgi protocol, the end of the response is indicated by a connection close. As such, even if Content-Length is available, nginx waits for the connection to be closed by the upstream server, and this is exactly what you observe. To avoid unneeded delays, consider closing the connection immediately after sending the response.

This probably can be improved to use Content-Length if available, similarly to how it is done in proxy since b66712cde67d and bba9a5ccc6cd.

comment:2 by Ian Marsh, 5 years ago

Thank you very much for looking into this and taking the time to respond.

Could I ask:

1) Do you have a source for "in the uwsgi protocol, the end of the response is indicated by a connection close", as I have failed to find this in hours of googling.

2) Likewise, I cannot find an option for "closing the connection immediately after sending the response" despite reading through all the uWSGI documentation. How can this be done?

3) Can you explain why this is intermittent, and only in a production environment? When I test on a development server - with identical configuration - nginx returns the response exactly when uWSGI finishes sending it, to within 1ms.

in reply to:  2 comment:3 by Maxim Dounin, 5 years ago

Replying to https://stackoverflow.com/users/10506504/ianm:

1) Do you have a source for "in the uwsgi protocol, the end of the response is indicated by a connection close", as I have failed to find this in hours of googling.

This is how it works from the introduction of the uwsgi module by the authors of the protocol. And this is also how it works in most similar protocols, including HTTP/1.0. The specification itself does not seem to explain these details.

2) Likewise, I cannot find an option for "closing the connection immediately after sending the response" despite reading through all the uWSGI documentation. How can this be done?

No idea. This is to be looked in uWSGI docs and the code.

3) Can you explain why this is intermittent, and only in a production environment? When I test on a development server - with identical configuration - nginx returns the response exactly when uWSGI finishes sending it, to within 1ms.

That's mostly up to the behaviour of the backend - nginx will return the response once the connection is closed. Most likely there are some housekeeping tasks which backend does after the response is returned, and it might take a while in your production environment, but not on the development server.

comment:4 by cbz@…, 5 years ago

Did you find a solution ? Currently experiencing the same thing with uwsgi 2.0.18 and nginx 1.15.8 though in this case the delay is a few seconds (3-6s) rather than the 20 seconds delay above.

Also I'm not sure how the connection closed signal would work when using a unix socket (which is my use case -- and the original scenario above).

comment:5 by Maxim Dounin, 3 years ago

Resolution: fixed
Status: acceptedclosed

Starting with 1.19.1 (bffcc5af1d72), nginx is able recognize that the whole response body was returned by a uwsgi backend, provided that Content-Length is available.

Note: See TracTickets for help on using tickets.