Opened 5 years ago

Closed 3 years ago

#776 closed defect (fixed)

Hang after NGX_AGAIN

Reported by: tguyard.wordpress.com Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.9.x
Keywords: module, filter, delayed, post_action Cc:
uname -a: Linux nginx03 3.13.0-55-generic #94-Ubuntu SMP Thu Jun 18 00:27:10 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.8.0
built with OpenSSL 1.0.1f 6 Jan 2014
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-ipv6 --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_addition_module --with-http_dav_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_spdy_module --with-http_sub_module --add-module=/nginx-1.8.0/debian/modules/nginx-aes-module --add-module=/nginx-1.8.0/debian/modules/nginx-upload-module --add-module=/nginx-1.8.0/debian/modules/ngx_http_substitutions_filter_module

Description

I've been trying to develop a filter module.

In some rare cases, ngx_http_write_filter return NGX_AGAIN (Mostly when using the limit_rate feature). My filter then just return NGX_AGAIN and wait to be called again later.
This works well most of the time.

The error occurs when my filter has finished to send all the data (buf->last_buf = 1;), and gets a NGX_AGAIN from the next filter. I then return NGX_AGAIN, but NEVER GET CALLED again.

Maybe I did something wrong, but when trying to fix the problem, I found something that seems weird in the source code:
ngx_http_finalize_request does not always reset the c->write->delayed value.

After adding c->write->delayed = 0; in ngx_http_finalize_request on line 2428, the error was fixed. (And everything seems to work ok)

The nginx I use come from the Ubuntu14/04 repository.

Attachments (1)

ngx_http_aes_module.c (15.1 KB ) - added by tguyard.wordpress.com 5 years ago.
The module I'm currently developping

Download all attachments as: .zip

Change History (12)

by tguyard.wordpress.com, 5 years ago

Attachment: ngx_http_aes_module.c added

The module I'm currently developping

comment:1 by Maxim Dounin, 5 years ago

The ngx_http_finalize_request() is not expected to reset c->write->delayed unless it's removing the timer. This flag indicates that processing of a request was delayed, and an appropriate timer was set. Processing of a request will be continued once the timer expires.

What you describe sounds like a problem in your filter. If you think the problem is in nginx - please try reducing your code to a bare minimum which demonstrates the problem, and provide corresponding debug logs. The code you've posted unfortunately is too complicated to easily analyse.

comment:2 by tguyard.wordpress.com, 5 years ago

Thank you for your reply.
I will dig a little deeper and try to find exactly what is going on.

comment:3 by tguyard.wordpress.com, 5 years ago

I've been able to reproduce the bug without my custom filter.

I'm using

  • x-accel-redirect from an fcgi (php) upstream.
  • x-accel-limit-rate = 2560000
  • a post_action (that can take some time)
  • keepalive is activated

(I will try to reproduce the bug without php)

What happens
When doing 2 requests on the same connection (keepalive), the first one is done correctly, the second one hangs, with no response.

There is no issue without keepalive (keepalive_timeout 0;).
There is no issue without the post_action.
There is no issue without the x-accel-limit-rate.
There is no issue with a very small value of x-accel-limit-rate.

What I gather so far

  • At the end of the first request, due to the x-accel-limit-rate, c->write->delayed = 1 and a timer is launched (I observed a very short timer ~7ms).
  • When doing ngx_http_finalize_request(), the timer is not canceled, because there is a post_action to do first.
  • During the post_action request, the timer expire, c->write->timer_set is set to 0 but c->write->delayed STAYS AT 1.
  • Then the post_action finishes and ngx_http_finalize_request() is called again. timer_set being = 0, the delayed value is not reset to 0.
  • The second request get processed with a delayed value of 1, then wait for a timer event that will never be fired.

comment:4 by tguyard.wordpress.com, 5 years ago

I was able to reproduce the bug with these parameters:

nginx.conf:
(you need the echo module)

worker_processes 4;
pid /run/nginx.pid;
daemon off;

events {
    worker_connections 1024;
}

http {
    sendfile on;
    keepalive_timeout 65;
    root /tmp/;
    
    server {
        listen 8888;

        location / {
            try_files /testfile =404;
            limit_rate 2560000;
            post_action @postaction;
        }

        location @postaction {
            internal;
            echo_sleep 1;
            try_files /testfile =404;
        }
    }
}

the /tmp/testfile I used was a random file of at least 2321 Bytes (no bug with smaller files)

> ll /tmp/testfile 
-rw-rw-r-- 1 thomas thomas 2321 Jul 30 15:55 /tmp/testfile

curl command:

curl 'http://localhost:8888' -o /dev/null 'http://localhost:8888' -o /dev/null --dump-header -

comment:5 by Maxim Dounin, 5 years ago

Thanks for the details. What you describe sounds like a bug in post_action. Not really a surprise, as post_action was always considered to be a dirty hack and it is not documented on purpose. Do you see any problems when not using post_action?

in reply to:  5 comment:6 by tguyard.wordpress.com, 5 years ago

There is no problem without post_action.

comment:7 by tguyard.wordpress.com, 5 years ago

This patch fix the bug for me :
patch -p 1 src/http/ngx_http_request.c < ngx_http_request.c.patch

with ngx_http_request.c.patch :

2416,2419d2415
<     if (ngx_http_post_action(r) == NGX_OK) {
<         return;
<     }
< 
2426a2423,2426
>     }
> 
>     if (ngx_http_post_action(r) == NGX_OK) {
>         return;

comment:8 by Maxim Dounin, 4 years ago

Keywords: post_action added
Status: newaccepted

comment:9 by Maxim Dounin, 3 years ago

See also #1228.

comment:10 by Maxim Dounin <mdounin@…>, 3 years ago

In 6961:903fb1ddc07f/nginx:

Moved handling of wev->delayed to the connection event handler.

With post_action or subrequests, it is possible that the timer set for
wev->delayed will expire while the active subrequest write event handler
is not ready to handle this. This results in request hangs as observed
with limit_rate / sendfile_max_chunk and post_action (ticket #776) or
subrequests (ticket #1228).

Moving the handling to the connection event handler fixes the hangs observed,
and also slightly simplifies the code.

comment:11 by Maxim Dounin, 3 years ago

Resolution: fixed
Status: acceptedclosed

Fix committed.

Note: See TracTickets for help on using tickets.