Opened 4 years ago

Closed 4 years ago

#1035 closed defect (wontfix)

limit_conn breaks on linux when keepalive_requests closes a connection

Reported by: guruxu@… Owned by:
Priority: major Milestone:
Component: nginx-module Version: 1.9.x
Keywords: Cc:
uname -a: Linux ip-10-233-110-98 4.4.14-24.50.amzn1.x86_64 #1 SMP Fri Jun 24 19:56:04 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: openresty/1.9.7.4
built by gcc 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC)
built with OpenSSL 1.0.1k-fips 8 Jan 2015
TLS SNI support enabled
configure arguments: --prefix=/opt/openresty/nginx --with-debug --with-cc-opt='-DNGX_LUA_USE_ASSERT -DNGX_LUA_ABORT_AT_PANIC -O2' --add-module=../ngx_devel_kit-0.2.19 --add-module=../iconv-nginx-module-0.13 --add-module=../echo-nginx-module-0.58 --add-module=../xss-nginx-module-0.05 --add-module=../ngx_coolkit-0.2rc3 --add-module=../set-misc-nginx-module-0.30 --add-module=../form-input-nginx-module-0.11 --add-module=../encrypted-session-nginx-module-0.04 --add-module=../srcache-nginx-module-0.30 --add-module=../ngx_lua-0.10.2 --add-module=../ngx_lua_upstream-0.05 --add-module=../headers-more-nginx-module-0.29 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.16 --add-module=../redis2-nginx-module-0.12 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.14 --add-module=../rds-csv-nginx-module-0.07 --with-ld-opt=-Wl,-rpath,/opt/openresty/luajit/lib --with-http_stub_status_module --with-http_realip_module --with-http_geoip_module --add-module=/tmp/rosapi_unpack/openresty/nginx-upstream-dynamic-servers-29e05c5de4d9e7042f66b99d22bef878fd761219 --with-http_ssl_module

Description

When I set limit_conn to 1, there are false alarms when a new connection is created by nginx when the number of requests from a same connection exceeds the value defined in keepalive_requests. Here's how I can reproduce:

Test client written in python that uses the requests module. It uses a session which leverages urllib3's connection pooling. In this code, a single connection is shared and calls are made sequentially:

#!/usr/bin/env python
import requests
import json

session = requests.Session()

headers = {"Content-Type": "application/json"}
data = {"content": "Foo Bar"}

results = {}
for x in range(0, 1000):
    r = session.post("http://hercules:9988/mytest", data=json.dumps(data), headers=headers)
    if r.status_code not in results:
        results[r.status_code] = 0;
    results[r.status_code] += 1;

for key, value in results.items():
    print(str(key) + ": " + str(value)))]]

On the nginx side, just the typical limit_conn by IP. I tried various keepalive_requests values from 10 to 1000. Only when it's 1000, I got no limit_conn errors. The rest I've seen in the log is that whenever a new connection is created, the very first few (1 to 5ish) requests trigger limit_conn. Snippet of log by grepping the status code of 429 (I use that for my limit_conn return status, and the log uses a custom format):

- - 2016-07-27T01:01:30-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:22206 conn_reqs:1 status:429 size:59  referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0
- - 2016-07-27T01:01:42-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:22405 conn_reqs:1 status:429 size:59  referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0
- - 2016-07-27T01:01:50-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:22604 conn_reqs:1 status:429 size:59  referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0
- - 2016-07-27T01:02:13-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:23004 conn_reqs:1 status:429 size:59  referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0
- - 2016-07-27T01:02:21-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:23203 conn_reqs:1 status:429 size:59  referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0
- - 2016-07-27T01:02:21-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:23203 conn_reqs:2 status:429 size:59  referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0
- - 2016-07-27T01:02:30-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:23400 conn_reqs:1 status:429 size:59  referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0
- - 2016-07-27T01:02:47-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:23599 conn_reqs:1 status:429 size:59  referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0
- - 2016-07-27T01:03:06-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:23798 conn_reqs:1 status:429 size:59  referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0

In the log I capture connection serial number (conn:$connection) and number of requests served (conn_reqs:$connection_requests). So it is pretty evident that when a new connection is created by exceeding keepalive_requests, somehow the old connection is still lingering around for a short period of time which triggers limit_conn?

Interestingly enough, I test the same client with the same build of nginx on my Mac, no problem at all. The only difference I can gather is that linux uses epoll for connection processing and Mac uses kqueues.

Change History (1)

comment:1 by Maxim Dounin, 4 years ago

Resolution: wontfix
Status: newclosed

What happens is more or less expected with the code in question.

Once nginx closes a connection due to keepalive_requests - it does so by sending a response with Connection: close header. After a response is fully sent and received by the client, the code in question will open a new connection and start sending a new request. On the other hand, nginx has to do some teardown work after the response is sent - log the request, free allocated structures, and so on. Moreover, in many cases it also wait for the client to close the connection (see lingering_close).

This creates a race: the client is already sending a new request, and nginx still closes the old one. If the client wins, it will get an error due to limit_conn.

The only solution is to avoid configuring such a low limits if you want to allow usage pattern like in the test.

Note: See TracTickets for help on using tickets.