Opened 12 years ago

Closed 11 years ago

Last modified 11 years ago

#353 closed defect (fixed)

nginx 1.4.[0|1] empty reply from server

Reported by: Maximilian Weber Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.3.x
Keywords: empty reply from server Cc:
uname -a: Linux app519.blah.com 2.6.32-220.el6.x86_64 #1 SMP Sat Dec 10 17:04:11 CST 2011 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.4.1
built by gcc 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)
TLS SNI support enabled
configure arguments: --user=nginx --group=nginx --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --pid-path=/var/run/nginx.pid --lock-path=/var/lock/subsys/nginx --with-http_stub_status_module --with-http_realip_module --with-http_ssl_module --with-http_gzip_static_module --with-mail --with-mail_ssl_module --with-ipv6 --add-module=/usr/lib/ruby/gems/1.8/gems/passenger-enterprise-server-4.0.0.rc4/ext/nginx

Description

Problem:

curl: random curl: (52) Empty reply from server
chrome: random Error 324 (net::ERR_EMPTY_RESPONSE): The server closed the connection without sending any data.

Upon upgrading to the new nginx version we saw random error messages when doing our healthchecks to static files on the server. this isn't utilizing passenger at this point but plain serving static files

  • this is only happening when during high traffic.
  • there is no loadbalancer involved.
  • error.log with debug doesn't show anything.
  • error shows with disabled iptables
  • no selinux

empty reply:

14:57:07.231413 IP 178.210.115.51.59960 > app515.blah.com.http: Flags [S], seq 2421292407, win 14600, options [mss 1460,sackOK,TS val 92501540 ecr 0,nop,wscale 7], length 0
14:57:07.231447 IP app515.blah.com.http > 178.210.115.51.59960: Flags [S.], seq 1123747008, ack 2421292408, win 14480, options [mss 1460,sackOK,TS val 1552315767 ecr 92501540,nop,wscale 7], length 0
14:57:07.250915 IP 178.210.115.51.59960 > app515.blah.com.http: Flags ., ack 1, win 115, options [nop,nop,TS val 92501545 ecr 1552315767], length 0
14:57:07.250973 IP app515.blah.com.http > 178.210.115.51.59960: Flags [F.], seq 1, ack 1, win 115, options [nop,nop,TS val 1552315855 ecr 92501545], length 0
14:57:07.251429 IP 178.210.115.51.59960 > app515.blah.com.http: Flags [P.], seq 1:95, ack 1, win 115, options [nop,nop,TS val 92501545 ecr 1552315767], length 94
14:57:07.251448 IP app515.blah.com.http > 178.210.115.51.59960: Flags [R], seq 1123747009, win 0, length 0
14:57:07.270451 IP 178.210.115.51.59960 > app515.blah.com.http: Flags [F.], seq 95, ack 2, win 115, options [nop,nop,TS val 92501549 ecr 1552315855], length 0
14:57:07.270474 IP app515.blah.com.http > 178.210.115.51.59960: Flags [R], seq 1123747010, win 0, length 0
14:57:07.670518 IP 178.210.115.51.54293 > app515.blah.com.http: Flags [F.], seq 13924, ack 2174, win 16276, length 0
14:57:07.674907 IP app515.blah.com.http > 178.210.115.51.54293: Flags [F.], seq 2174, ack 13925, win 334, length 0
14:57:07.702982 IP 178.210.115.51.54293 > app515.blah.com.http: Flags ., ack 2175, win 16276, length 0
14:57:09.293112 IP 178.210.115.51.53615 > app515.blah.com.http: Flags [F.], seq 17172, ack 4741, win 16425, length 0
14:57:09.293153 IP app515.blah.com.http > 178.210.115.51.53615: Flags [F.], seq 4741, ack 17173, win 390, length 0
14:57:09.338779 IP 178.210.115.51.53615 > app515.blah.com.http: Flags ., ack 4742, win 16425, length 0

good reply:
14:57:54.336213 IP 178.210.115.51.59975 > app515.blah.com.http: Flags [S], seq 1550226949, win 14600, options [mss 1460,sackOK,TS val 92513316 ecr 0,nop,wscale 7], length 0
14:57:54.336236 IP app515.blah.com.http > 178.210.115.51.59975: Flags [S.], seq 1532621207, ack 1550226950, win 14480, options [mss 1460,sackOK,TS val 1552362940 ecr 92513316,nop,wscale 7], length 0
14:57:54.544906 IP 178.210.115.51.59975 > app515.blah.com.http: Flags ., ack 1, win 115, options [nop,nop,TS val 92513368 ecr 1552362940], length 0
14:57:54.545233 IP 178.210.115.51.59975 > app515.blah.com.http: Flags [P.], seq 1:95, ack 1, win 115, options [nop,nop,TS val 92513368 ecr 1552362940], length 94
14:57:54.545257 IP app515.blah.com.http > 178.210.115.51.59975: Flags ., ack 95, win 114, options [nop,nop,TS val 1552363149 ecr 92513368], length 0
14:57:54.546194 IP app515.blah.com.http > 178.210.115.51.59975: Flags [P.], seq 1:231, ack 95, win 114, options [nop,nop,TS val 1552363150 ecr 92513368], length 230
14:57:54.749511 IP 178.210.115.51.59975 > app515.blah.com.http: Flags ., ack 231, win 123, options [nop,nop,TS val 92513419 ecr 1552363150], length 0
14:57:54.750097 IP 178.210.115.51.59975 > app515.blah.com.http: Flags [F.], seq 95, ack 231, win 123, options [nop,nop,TS val 92513419 ecr 1552363150], length 0
14:57:54.750135 IP app515.blah.com.http > 178.210.115.51.59975: Flags [F.], seq 231, ack 96, win 114, options [nop,nop,TS val 1552363354 ecr 92513419], length 0
14:57:54.857239 IP 178.210.115.51.59975 > app515.blah.com.http: Flags ., ack 232, win 123, options [nop,nop,TS val 92513446 ecr 1552363354], length 0

Change History (16)

comment:1 by Maximilian Weber, 12 years ago

Here is a example test with nginx-1.4.1 running on the server

while true; do date -R; curl -m 10 'http://app519.blah.com/watchdog'; done

mysql: OK<br>cache: OK<br>Wed, 08 May 2013 09:56:27 +0200
mysql: OK<br>cache: OK<br>Wed, 08 May 2013 09:56:27 +0200
mysql: OK<br>cache: OK<br>Wed, 08 May 2013 09:56:28 +0200
mysql: OK<br>cache: OK<br>Wed, 08 May 2013 09:56:28 +0200
mysql: OK<br>cache: OK<br>Wed, 08 May 2013 09:56:28 +0200
curl: (52) Empty reply from server
Wed, 08 May 2013 09:56:28 +0200
mysql: OK<br>cache: OK<br>Wed, 08 May 2013 09:56:28 +0200
mysql: OK<br>cache: OK<br>Wed, 08 May 2013 09:56:28 +0200
curl: (52) Empty reply from server
Wed, 08 May 2013 09:56:28 +0200
mysql: OK<br>cache: OK<br>Wed, 08 May 2013 09:56:28 +0200
mysql: OK<br>cache: OK<br>Wed, 08 May 2013 09:56:28 +0200
curl: (52) Empty reply from server
Wed, 08 May 2013 09:56:28 +0200
curl: (52) Empty reply from server
Wed, 08 May 2013 09:56:28 +0200
mysql: OK<br>cache: OK<br>Wed, 08 May 2013 09:56:29 +0200

comment:2 by Maximilian Weber, 12 years ago

2013/05/08 07:44:04 [info] 15847#0: *45649 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 78.46.35.x, server: 0.0.0.0:80
[10:08:58 AM]

This info message seems to correlate with the curl: (52) Empty reply from server
I found the error message here: http://nginx.org/patches/spdy/patch.spdy.txt

+#if (NGX_HAVE_DEFERRED_ACCEPT && defined TCP_DEFER_ACCEPT)
+ if (c->listening->deferred_accept) {
+ ngx_log_error(NGX_LOG_INFO, c->log, NGX_ETIMEDOUT,
+ "client timed out in deferred accept");
+ ngx_http_close_connection(c);
+ return;
+ }
+#endif

comment:3 by Valentin V. Bartenev, 12 years ago

Could you please confirm the correlation? Please note, that those messages are usually related to Chrome and were not a problem by themselves.

comment:4 by Maximilian Weber, 12 years ago

Yes, I can confirm this the correlation between the debug message and the empty reply.
This problem is only happening via http requests, https works great. spdy is not activated explicitly:

server {

listen 80;
listen 443 ssl;

...

request loop:

Wed May 8 14:07:51 CEST 2013
mysql: OK<br>cache: OK<br>

Wed May 8 14:07:51 CEST 2013
curl: (52) Empty reply from server

Wed May 8 14:07:51 CEST 2013
mysql: OK<br>cache: OK<br>

logfile (debug):

2013/05/08 12:07:51 [info] 18283#0: *1665613 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 178.210.115.xX, server: 0.0.0.0:80

in reply to:  4 comment:5 by Valentin V. Bartenev, 12 years ago

Replying to Maximilian Weber <maximilian.weber@gmail.com>:

Yes, I can confirm this the correlation between the debug message and the empty reply.
This problem is only happening via http requests, https works great. spdy is not activated explicitly:

server {

listen 80;
listen 443 ssl;

...

Do you actually have the "deferred" option somewhere in one of your "listen" directives?

comment:6 by Maximilian Weber, 12 years ago

When using deferred the error is gone.

in reply to:  6 comment:7 by Valentin V. Bartenev, 12 years ago

Replying to Maximilian Weber <maximilian.weber@gmail.com>:

When using deferred the error is gone.

Well, the problem is that somehow nginx thinks that you have enabled TCP_DEFER_ACCEPT, but you have not.

Could you please confirm that the problem is gone with vanilla nginx, without 3-rd party modules like "passenger"?

comment:8 by Maximilian Weber, 12 years ago

I get the same empty reply from server with vanilla nginx. I think the issue is my config file for this vhost. I basically have two different server configurations in the same file, pointing to the same root directory. I need different errors sites for my api domain.

server {

listen 80;
listen 443 ssl;
servername iframe.blah.com;

.
.
}

server {

listen 80 deferred;
listen 443 ssl deferred;
servername api.blah.com

.
.
}

Splitting the two domains in two config files seems to fix the issue?!?

comment:9 by Maxim Dounin, 11 years ago

Just a quick note: the d8ee8ef5dcde changeset may be related. If a listening socket with a "deferred" parameter was added to a configuration during a binary upgrade, it may have deferred_accept flag set, but setsockopt(TCP_DEFER_ACCEPT) not actually called for it. This case is expected to be fixed by the changeset in question.

comment:10 by Alex Vorona, 11 years ago

I got similar problem, not related to online upgrades. Cut from error_log

2013/11/27 11:33:36 [info] 9521#0: *954 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 95.56.35.229, server: 0.0.0.0:80
2013/11/27 11:33:37 [info] 9521#0: *955 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 88.147.174.169, server: 0.0.0.0:80
2013/11/27 11:33:37 [info] 9521#0: *956 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 195.189.17.77, server: 0.0.0.0:80
2013/11/27 11:33:37 [info] 9521#0: *957 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 193.151.12.69, server: 0.0.0.0:80
2013/11/27 11:33:37 [info] 9521#0: *958 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 95.56.35.229, server: 0.0.0.0:80
2013/11/27 11:33:37 [info] 9521#0: *959 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 95.56.35.229, server: 0.0.0.0:80
2013/11/27 11:33:37 [info] 9521#0: *960 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 193.151.12.69, server: 0.0.0.0:80
2013/11/27 11:33:37 [info] 9521#0: *961 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 95.56.35.229, server: 0.0.0.0:80
2013/11/27 11:33:38 [info] 9523#0: *965 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 2.61.70.127, server: 0.0.0.0:80
2013/11/27 11:33:39 [info] 9518#0: *966 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 2.61.70.127, server: 0.0.0.0:80

nginx from nginx repo.

nginx -V
nginx version: nginx/1.4.4
built by gcc 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) 
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-mail --with-mail_ssl_module --with-file-aio --with-ipv6 --with-cc-opt='-O2 -g -pipe -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic'

uname -a
2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

I have 2 listen in configuration

nginx.conf:		listen      80 default sndbuf=1024k backlog=128 deferred;
vhosts/cdn.conf:        listen      80;

I can stop nginx ensuring it's stopped and start it - problem is still here. And this is not a Chrome issue, I got same with curl.
At the same time I got in dmesg

possible SYN flooding on port 80. Sending cookies.
possible SYN flooding on port 80. Sending cookies.
possible SYN flooding on port 80. Sending cookies.

While there is no synflood or any abnormal activity.

I'm changing listen option by removing "deferred" and after reload problem is gone. Enabling "deferred" and reloading again - problem is here again. It's not 100% reproduce - nginx can work for some time with "deferred" enabled from start without problems, but if I hit this problem - removing "deferred" fixes it, while stop and start did not help.

Here are logs when problem "begins". Seems like socket backlog overflow triggers this problem.
/var/log/messages

Nov 27 08:23:01 lenomp487 rhsmd: This system is registered to RHN Classic.
Nov 27 09:06:07 lenomp487 auditd[1710]: Audit daemon rotating log files
Nov 27 11:08:08 lenomp487 kernel: possible SYN flooding on port 80. Sending cookies.
Nov 27 11:09:08 lenomp487 kernel: possible SYN flooding on port 80. Sending cookies.
Nov 27 11:10:08 lenomp487 kernel: possible SYN flooding on port 80. Sending cookies

/var/log/nginx/error_log

2013/11/27 11:04:10 [info] 4711#0: *37626 client closed connection while waiting for request, client: 193.34.160.61, server: 0.0.0.0:80
2013/11/27 11:04:18 [info] 4715#0: *37629 client closed connection while waiting for request, client: 95.31.137.207, server: 0.0.0.0:80
2013/11/27 11:04:54 [info] 4711#0: *37675 client closed connection while waiting for request, client: 89.254.245.27, server: 0.0.0.0:80
2013/11/27 11:05:02 [info] 4715#0: *37684 client closed connection while waiting for request, client: 213.184.254.14, server: 0.0.0.0:80
2013/11/27 11:05:02 [info] 4711#0: *37685 client closed connection while waiting for request, client: 213.184.254.14, server: 0.0.0.0:80
2013/11/27 11:05:02 [info] 4711#0: *37686 client closed connection while waiting for request, client: 213.184.254.14, server: 0.0.0.0:80
2013/11/27 11:05:57 [info] 4710#0: *37740 client closed connection while waiting for request, client: 109.172.58.130, server: 0.0.0.0:80
2013/11/27 11:06:14 [info] 4715#0: *37758 client closed connection while waiting for request, client: 212.152.53.188, server: 0.0.0.0:80
2013/11/27 11:07:04 [info] 4711#0: *37885 client closed connection while waiting for request, client: 176.195.46.229, server: 0.0.0.0:80
2013/11/27 11:07:33 [info] 4712#0: *37933 client closed connection while waiting for request, client: 130.255.61.76, server: 0.0.0.0:80
2013/11/27 11:07:41 [info] 4715#0: *37953 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 178.158.134.225, server: 0.0.0.0:80
2013/11/27 11:07:48 [info] 4714#0: *37972 client closed connection while waiting for request, client: 95.154.76.138, server: 0.0.0.0:80
2013/11/27 11:08:08 [info] 4710#0: *37997 client closed connection while waiting for request, client: 95.154.76.138, server: 0.0.0.0:80
2013/11/27 11:08:12 [info] 4712#0: *37998 client closed connection while waiting for request, client: 178.172.229.243, server: 0.0.0.0:80
2013/11/27 11:08:15 [info] 4713#0: *37999 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 178.185.66.131, server: 0.0.0.0:80
2013/11/27 11:08:15 [info] 4713#0: *38000 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 178.185.66.131, server: 0.0.0.0:80
2013/11/27 11:08:19 [info] 4712#0: *38001 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 93.84.47.127, server: 0.0.0.0:80
2013/11/27 11:08:19 [info] 4712#0: *38002 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 93.84.47.127, server: 0.0.0.0:80
2013/11/27 11:08:20 [info] 4710#0: *38003 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 5.105.60.12, server: 0.0.0.0:80
2013/11/27 11:08:21 [info] 4712#0: *38004 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 5.105.60.12, server: 0.0.0.0:80
2013/11/27 11:08:21 [info] 4712#0: *38005 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 46.48.78.35, server: 0.0.0.0:80
2013/11/27 11:08:21 [info] 4712#0: *38006 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 46.48.78.35, server: 0.0.0.0:80
2013/11/27 11:08:22 [info] 4710#0: *38007 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 93.84.47.127, server: 0.0.0.0:80
2013/11/27 11:08:26 [info] 4711#0: *38008 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 178.172.229.243, server: 0.0.0.0:80
2013/11/27 11:08:26 [info] 4711#0: *38009 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 178.172.229.243, server: 0.0.0.0:80
2013/11/27 11:08:26 [info] 4714#0: *38010 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 86.57.190.217, server: 0.0.0.0:80
2013/11/27 11:08:27 [info] 4714#0: *38011 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 86.57.190.217, server: 0.0.0.0:80
2013/11/27 11:08:27 [info] 4712#0: *38012 client timed out in deferred accept (110: Connection timed out) while waiting for request, client: 178.185.66.131, server: 0.0.0.0:80

comment:11 by Maxim Dounin, 11 years ago

I don't think that this is related to original issue reported, but nevertheless it looks like a problem.

The "possible SYN flood" messages are likely related to too many connections sitting in the accept queue, due to accept deferred. Looks like the kernel starts sending syncookies in this case, and probably connections established using syncookies bypass deferred accept queue. This would explain why normal connections are dropped immediately in such situation - they are passed to nginx immediately rather than after deferred accept timeout.

Could you please try to test if the above is what actually happens?

comment:12 by Alex Vorona, 11 years ago

Hello,

Now I can trigger the problem for some time by lowering listen backlog to something like 10 and overflow it with deferred connections

for i in `seq 1 30`;do nc 127.0.0.1 80 & done

After that I got "client timed out in deferred accept" in error_log for new requests.

Could you please try to test if the above is what actually happens?

It looks like you are right - overflowed and thus syncookie'd connections are passed to nginx immediately.

Anyway it's wrong to use deferred with small backlog values.

comment:13 by Maxim Dounin, 11 years ago

Backlog size doesn't really matters, the actual problem is how kernel handles the deferred queue overflow case. It turns out that current handling doesn't allow us to assume anything about connections accepted with TCP_DEFER_ACCEPT, and changeset 05a56ebb084a should be reconsidered.

comment:14 by Maxim Dounin <mdounin@…>, 11 years ago

In fdb67cfc957d110ea887961cc8c08a590df5f62c/nginx:

Fixed TCP_DEFER_ACCEPT handling (ticket #353).

Backed out 05a56ebb084a, as it turns out that kernel can return connections
without any delay if syncookies are used. This basically means we can't
assume anything about connections returned with deferred accept set.

To solve original problem the 05a56ebb084a tried to solve, i.e. to don't
wait longer than needed if a connection was accepted after deferred accept
timeout, this patch changes a timeout set with setsockopt(TCP_DEFER_ACCEPT)
to 1 second, unconditionally. This is believed to be enough for speed
improvements, and doesn't imply major changes to timeouts used.

Note that before 2.6.32 connections were dropped after a timeout. Though
it is believed that 1s is still appropriate for kernels before 2.6.32,
as previously tcp_synack_retries controlled the actual timeout and 1s results
in more than 1 minute actual timeout by default.

comment:15 by Maxim Dounin, 11 years ago

Resolution: fixed
Status: newclosed

Fix committed, thanks.

comment:16 by Maxim Dounin <mdounin@…>, 11 years ago

In 462ae7eedc681205070d53bb3d46a9bc4eed9744/nginx:

Fixed TCP_DEFER_ACCEPT handling (ticket #353).

Backed out 05a56ebb084a, as it turns out that kernel can return connections
without any delay if syncookies are used. This basically means we can't
assume anything about connections returned with deferred accept set.

To solve original problem the 05a56ebb084a tried to solve, i.e. to don't
wait longer than needed if a connection was accepted after deferred accept
timeout, this patch changes a timeout set with setsockopt(TCP_DEFER_ACCEPT)
to 1 second, unconditionally. This is believed to be enough for speed
improvements, and doesn't imply major changes to timeouts used.

Note that before 2.6.32 connections were dropped after a timeout. Though
it is believed that 1s is still appropriate for kernels before 2.6.32,
as previously tcp_synack_retries controlled the actual timeout and 1s results
in more than 1 minute actual timeout by default.

Note: See TracTickets for help on using tickets.