Opened 8 years ago
Closed 8 years ago
#1035 closed defect (wontfix)
limit_conn breaks on linux when keepalive_requests closes a connection
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | major | Milestone: | |
Component: | nginx-module | Version: | 1.9.x |
Keywords: | Cc: | ||
uname -a: | Linux ip-10-233-110-98 4.4.14-24.50.amzn1.x86_64 #1 SMP Fri Jun 24 19:56:04 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: openresty/1.9.7.4
built by gcc 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) built with OpenSSL 1.0.1k-fips 8 Jan 2015 TLS SNI support enabled configure arguments: --prefix=/opt/openresty/nginx --with-debug --with-cc-opt='-DNGX_LUA_USE_ASSERT -DNGX_LUA_ABORT_AT_PANIC -O2' --add-module=../ngx_devel_kit-0.2.19 --add-module=../iconv-nginx-module-0.13 --add-module=../echo-nginx-module-0.58 --add-module=../xss-nginx-module-0.05 --add-module=../ngx_coolkit-0.2rc3 --add-module=../set-misc-nginx-module-0.30 --add-module=../form-input-nginx-module-0.11 --add-module=../encrypted-session-nginx-module-0.04 --add-module=../srcache-nginx-module-0.30 --add-module=../ngx_lua-0.10.2 --add-module=../ngx_lua_upstream-0.05 --add-module=../headers-more-nginx-module-0.29 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.16 --add-module=../redis2-nginx-module-0.12 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.14 --add-module=../rds-csv-nginx-module-0.07 --with-ld-opt=-Wl,-rpath,/opt/openresty/luajit/lib --with-http_stub_status_module --with-http_realip_module --with-http_geoip_module --add-module=/tmp/rosapi_unpack/openresty/nginx-upstream-dynamic-servers-29e05c5de4d9e7042f66b99d22bef878fd761219 --with-http_ssl_module |
Description
When I set limit_conn to 1, there are false alarms when a new connection is created by nginx when the number of requests from a same connection exceeds the value defined in keepalive_requests. Here's how I can reproduce:
Test client written in python that uses the requests module. It uses a session which leverages urllib3's connection pooling. In this code, a single connection is shared and calls are made sequentially:
#!/usr/bin/env python import requests import json session = requests.Session() headers = {"Content-Type": "application/json"} data = {"content": "Foo Bar"} results = {} for x in range(0, 1000): r = session.post("http://hercules:9988/mytest", data=json.dumps(data), headers=headers) if r.status_code not in results: results[r.status_code] = 0; results[r.status_code] += 1; for key, value in results.items(): print(str(key) + ": " + str(value)))]]
On the nginx side, just the typical limit_conn by IP. I tried various keepalive_requests values from 10 to 1000. Only when it's 1000, I got no limit_conn errors. The rest I've seen in the log is that whenever a new connection is created, the very first few (1 to 5ish) requests trigger limit_conn. Snippet of log by grepping the status code of 429 (I use that for my limit_conn return status, and the log uses a custom format):
- - 2016-07-27T01:01:30-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:22206 conn_reqs:1 status:429 size:59 referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0 - - 2016-07-27T01:01:42-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:22405 conn_reqs:1 status:429 size:59 referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0 - - 2016-07-27T01:01:50-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:22604 conn_reqs:1 status:429 size:59 referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0 - - 2016-07-27T01:02:13-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:23004 conn_reqs:1 status:429 size:59 referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0 - - 2016-07-27T01:02:21-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:23203 conn_reqs:1 status:429 size:59 referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0 - - 2016-07-27T01:02:21-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:23203 conn_reqs:2 status:429 size:59 referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0 - - 2016-07-27T01:02:30-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:23400 conn_reqs:1 status:429 size:59 referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0 - - 2016-07-27T01:02:47-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:23599 conn_reqs:1 status:429 size:59 referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0 - - 2016-07-27T01:03:06-04:00 hercules "POST /mytest HTTP/1.1" req_time:0.000 req_len:280 conn:23798 conn_reqs:1 status:429 size:59 referer:- useragent:python-requests/2.2.1 CPython/2.7.11 Darwin/14.5.0
In the log I capture connection serial number (conn:$connection) and number of requests served (conn_reqs:$connection_requests). So it is pretty evident that when a new connection is created by exceeding keepalive_requests, somehow the old connection is still lingering around for a short period of time which triggers limit_conn?
Interestingly enough, I test the same client with the same build of nginx on my Mac, no problem at all. The only difference I can gather is that linux uses epoll for connection processing and Mac uses kqueues.
What happens is more or less expected with the code in question.
Once nginx closes a connection due to
keepalive_requests
- it does so by sending a response withConnection: close
header. After a response is fully sent and received by the client, the code in question will open a new connection and start sending a new request. On the other hand, nginx has to do some teardown work after the response is sent - log the request, free allocated structures, and so on. Moreover, in many cases it also wait for the client to close the connection (see lingering_close).This creates a race: the client is already sending a new request, and nginx still closes the old one. If the client wins, it will get an error due to
limit_conn
.The only solution is to avoid configuring such a low limits if you want to allow usage pattern like in the test.