Opened 6 years ago

Closed 6 years ago

Last modified 4 years ago

#1484 closed defect (duplicate)

Timeouts when proxying to Apache and using Keepalive

Reported by: Veroslav Cenov Owned by:
Priority: major Milestone:
Component: other Version: 1.12.x
Keywords: keepalive upstream timeout Cc: veroslav.cenov@…
uname -a: Linux elastic 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.12.2
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)
built with OpenSSL 1.0.2k-fips 26 Jan 2017
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -pie'

Description

Hello,

I have the following setup:

1) Apache backend
2) Nginx proxy

The request path goes like:
Browser --> Nginx [proxy_pass] --> Apache

In order to use KeepAlive between Nginx --> Apache, I'm doing the following:

1) Enable HTTP 1.1 , and clean the Connection header:


server {
....

location / {

....
..
proxy_set_header Connection "";
proxy_http_version 1.1;
proxy_pass http://upstream_80;
...
.....

}

}

2) Add keepalive to my upstream


http{
....

upstream upstream_80 {

keepalive 100;
server 127.0.0.1:8080 max_fails=0 ;

}

....
}


The configuration above (+ enabling KeepAlive in Apache upstream), leads to succesfull usage of KeepAlive between Nginx and Apache.

But this causes some very weird bug: some requests are being dropped and not even processed by Apache.

I'm doing benchmarks with the following:

root@localhost# echo ss > test_post
root@localhost# /usr/local/apache/bin/ab -k -p test_post -c 100 -n 2000000 http://mydomain.com/non_existing_file.html

Where "non_existing_file.html" is non-existing file, which returns 404 , but the request itself makes it through Apache.
(It doesn't matter if file exists or not, I'm using non-existent, because the test goes faster)

For a 2 000 000 hits benchmark I'm receiving bunch of timeouts , logged in Nginx error log and looking like this:
-----
2018/02/16 19:04:37 [crit] 17155#17155: *231641 connect() to 127.0.0.1:8080 failed (99: Cannot assign requested address) while connecting to upstream, client: 127.0.0.1, server: static.superhosting.bg, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "static.superhosting.bg"
2018/02/16 19:04:37 [crit] 17152#17152: *231633 connect() to 127.0.0.1:8080 failed (99: Cannot assign requested address) while connecting to upstream, client: 127.0.0.1, server: static.superhosting.bg, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "static.superhosting.bg"
2018/02/16 19:04:37 [crit] 17152#17152: *232237 connect() to 127.0.0.1:8080 failed (99: Cannot assign requested address) while connecting to upstream, client: 127.0.0.1, server: static.superhosting.bg, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "static.superhosting.bg"
2018/02/16 19:04:37 [crit] 17152#17152: *231387 connect() to 127.0.0.1:8080 failed (99: Cannot assign requested address) while connecting to upstream, client: 127.0.0.1, server: static.superhosting.bg, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "static.superhosting.bg"
2018/02/16 19:04:37 [crit] 17152#17152: *232756 connect() to 127.0.0.1:8080 failed (99: Cannot assign requested address) while connecting to upstream, client: 127.0.0.1, server: static.superhosting.bg, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "static.superhosting.bg"


Apache itself doesn't log anything (even with LogLevel +trace9).

If I disable the "keepalive" directive inside upstream {} (nginx.conf), the problem becomes immediately fixed.

I was able to successfully simulate this behavior to the following Nginx versions:
Nginx - 1.12.2 , 1.11.8 , 1.13.7

Also this behaviour is observed over both http/https connections.

Change History (6)

comment:1 by Veroslav Cenov, 6 years ago

Hello,

I have the following setup:

1) Apache backend
2) Nginx proxy

The request path goes like:
Browser --> Nginx [proxy_pass] --> Apache

In order to use KeepAlive between Nginx --> Apache, I'm doing the following:

1) Enable HTTP 1.1 , and clean the Connection header:


server {
....

location / {

....
..
proxy_set_header Connection "";
proxy_http_version 1.1;
proxy_pass http://upstream_80;
...
.....

}

}

2) Add keepalive to my upstream


http{
....

upstream upstream_80 {

keepalive 100;
server 127.0.0.1:8080 max_fails=0 ;

}

....
}


The configuration above (+ enabling KeepAlive in Apache upstream), leads to succesfull usage of KeepAlive between Nginx and Apache.

But this causes some very weird bug: some requests are being dropped and not even processed by Apache.

I'm doing benchmarks with the following:

root@localhost# echo ss > test_post
root@localhost# /usr/local/apache/bin/ab -k -p test_post -c 100 -n 2000000 http://mydomain.com/non_existing_file.html

Where "non_existing_file.html" is non-existing file, which returns 404 , but the request itself makes it through Apache.
(It doesn't matter if file exists or not, I'm using non-existent, because the test goes faster)

For a 2 000 000 hits benchmark I'm receiving bunch of timeouts , logged in Nginx error log and looking like this:
-----
2018/02/16 19:04:37 [crit] 17155#17155: *231641 connect() to 127.0.0.1:8080 failed (99: Cannot assign requested address) while connecting to upstream, client: 127.0.0.1, server: mydomain.com, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "mydomain.com"
2018/02/16 19:04:37 [crit] 17152#17152: *231633 connect() to 127.0.0.1:8080 failed (99: Cannot assign requested address) while connecting to upstream, client: 127.0.0.1, server: mydomain.com, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "mydomain.com"
2018/02/16 19:04:37 [crit] 17152#17152: *232237 connect() to 127.0.0.1:8080 failed (99: Cannot assign requested address) while connecting to upstream, client: 127.0.0.1, server: mydomain.com, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "mydomain.com"
2018/02/16 19:04:37 [crit] 17152#17152: *231387 connect() to 127.0.0.1:8080 failed (99: Cannot assign requested address) while connecting to upstream, client: 127.0.0.1, server: mydomain.com, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "mydomain.com"
2018/02/16 19:04:37 [crit] 17152#17152: *232756 connect() to 127.0.0.1:8080 failed (99: Cannot assign requested address) while connecting to upstream, client: 127.0.0.1, server: mydomain.com, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "mydomain.com"


Apache itself doesn't log anything (even with LogLevel +trace9).

If I disable the "keepalive" directive inside upstream {} (nginx.conf), the problem becomes immediately fixed.

I was able to successfully simulate this behavior to the following Nginx versions:
Nginx - 1.12.2 , 1.11.8 , 1.13.7

Also this behaviour is observed over both http/https connections.

comment:2 by Maxim Dounin, 6 years ago

The error message (connect() to 127.0.0.1:8080 failed (99: Cannot assign requested address)...) suggests you've run out of local ports, and nginx wasn't able to connect to the backend server due to this. Check netstat output when the error occurs to see which sockets occupy local ports with the 127.0.0.1:8080 destination.

Using keepalive can easily results of TIME_WAIT sockets being moved from the backend side to nginx side when the connection is finally closed, and this may cause similar errors due to all local ports occupied by TIME_WAIT sockets. On the other hand, just removing the keepalive directive from upstream block won't fix it, but rather will make things worse unless proxy_set_header Connection ""; is removed as well.

Note well that keepalive 100 as specified in your configuration means that nginx will cache up to 100 connections in each worker processes, and you backend server must be capable of handling 100 * (nginx worker_processes) connections. If it isn't, a sort of deadlock may occur, when all connections your backend server can handle are cached in other nginx worker processes, and nginx tries to open an additional connection - but the backend isn't able to respond.

comment:3 by Veroslav Cenov, 6 years ago

I'm sorry for the wrong log I've pasted !

Actually these lines are the result of removing keepalive, but they are not important.

The real lines I was getting were:
---
2018/02/16 19:03:01 [error] 16956#16956: *19401 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: my-site.com, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "my-site.com"
2018/02/16 19:03:01 [error] 16956#16956: *19439 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: my-site.com, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "my-site.com"
2018/02/16 19:03:01 [error] 16955#16955: *19378 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: my-site.com, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "my-site.com"
2018/02/16 19:03:01 [error] 16955#16955: *19325 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: my-site.com, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "my-site.com"
2018/02/16 19:03:01 [error] 16956#16956: *19289 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: my-site.com, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "my-site.com"
2018/02/16 19:03:01 [error] 16956#16956: *19357 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: my-site.com, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "my-site.com"
2018/02/16 19:03:01 [error] 16956#16956: *19425 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: my-site.com, request: "POST /ss.html HTTP/1.0", upstream: "http://127.0.0.1:8080/ss.html", host: "my-site.com"


P.S Don;t know how to edit my original Ticket and update the log lines (if you can do it, would be nice).
Thanks

comment:4 by Maxim Dounin, 6 years ago

The "upstream prematurely closed connection" errors are likely a result of the well-known race condition with keepalive requests: when a server closes a keepalive connection, there is a chance that the client will start sending the request to the connection at the same time.

When nginx detects such a situation with a keepalive connection, it will re-try sending the request to upstream. In this particular case, however, the request is POST, and hence cannot be retried unless proxy_next_upstream ... non_idempotent; is set, see proxy_next_upstream.

You may also consider configuring larger keepalive timeouts on the backend side to minimize the chance for the race in question, default as used by Apache is very low. Also, there are some preliminary patches for nginx, introducing keepalive_timeout in the upstream block, so it can be set to a value smaller than one on the backend side, eliminating the race.

comment:5 by Veroslav Cenov, 6 years ago

Thanks for you answer, I didn't found this thread after some good searching about this topic :)

Do you know if this patch is going to be included in some of the Nginx versions ?

For the KeepAlive timeout, we already increased it, but even increased, at certain point the race condition happens and breaks some POST request (and usually these requests are important ).

For now , our temporary fix for the problem is disabling the keepalive between Nginx and Apache , and also setting: net.ipv4.tcp_tw_reuse=1 , which effectively solves the problem with too many connections in TIME_WAIT state and the fullfill of local port range for new connections (something that happened, in the wrong log lines I've sent).

After setting net.ipv4.tcp_tw_reuse=1, we don't get any "(99: Cannot assign requested address..." errors.

comment:6 by Maxim Dounin, 6 years ago

Resolution: duplicate
Status: newclosed

Duplicate of #1170, fixed in d9029e113a05.

Note: See TracTickets for help on using tickets.