Opened 6 years ago

Closed 6 years ago

#612 closed defect (invalid)

multiple users can still do "accept" when "accept_mutex" is "on", in a specific scenario

Reported by: Shmulik Biran Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.4.x
Keywords: accept_mutex, ngx_accept_disabled Cc:
uname -a: Linux localhost 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.4.1
built by gcc 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
configure arguments: --add-module=src/falcon --without-http_auth_basic_module --without-http_autoindex_module --without-http_empty_gif_module --without-http_fastcgi_module --wi--without-http_gzip_module --without-http_memcached_module --without-http_referer_module --without-http_scgi_module --without-http_split_clients_module --without-http_ssi_moduream_ip_hash_module --without-http_userid_module --without-http_uwsgi_module --with-http_stub_status_module --with-file-aio --with-http_flv_module --with-ipv6 --with-debug

Description

Background:

  • When using "accept_mutex on", only one worker should accept new connections at a time.
  • There is a backoff mechanism, that reduces the chance a worker will take the accept mutex if it's reaching the limit of his "max_connections", that uses this logic:
    • After a worker accepts a connection, it sets the "ngx_accept_disabled" var like this (ngx_event_accept.c:141):
        ngx_accept_disabled = ngx_cycle->connection_n / 8
                              - ngx_cycle->free_connection_n;
  • Then, in "ngx_process_events_and_timers" method (ngx_event.c:202), the worker will try to take the accept mutex only if ngx_accept_disabled is not positive:
    if (ngx_use_accept_mutex) {
        if (ngx_accept_disabled > 0) {
            ngx_accept_disabled--;

        } else {
            if (ngx_trylock_accept_mutex(cycle) == NGX_ERROR) {
                return;
            }
        ...
  • Inside the function "ngx_trylock_accept_mutex" (ngx_event_accept.c:378), the worker will either enable accept events if it gets the accept mutex, or disable accept events if it fails.

The problem:

  • Once ngx_accept_disabled becomes a positive number (after accepting a connection that crosses the 7/8 limit as shown in the code above), the worker will indeed not try to acquire the mutex on the next "ngx_process_events_and_timers" call. However, it will not disable the accept events (which were enabled the last time it acquired the mutex)
  • This means that:
  1. Instead of reducing the chances that the worker connections count will grow, we increase it (and each accepted connection in this state will re-trigger the same behavior).
  2. We will have more than 1 "accepting" worker at a time (in this scenario).

I believe that the code should be something like this (logically):

    if (ngx_use_accept_mutex) {
        if (ngx_accept_disabled > 0) {
            ngx_accept_disabled--;

            if (ngx_accept_mutex_held) {
                if (ngx_disable_accept_events(cycle) == NGX_ERROR) {
                    return;
                }

                ngx_accept_mutex_held = 0;
            }
        } else {
            if (ngx_trylock_accept_mutex(cycle) == NGX_ERROR) {
                return;
            }
        ...

Notes:

  • I've seen this in version 1.4.1, however from a look at the sources at github it seems that the issue still exists in 1.7 as well.

Thanks,
Shmulik Bibi

Change History (5)

comment:1 by Valentin V. Bartenev, 6 years ago

I don't see a problem here. This mechanism is only needed to prevent one worker with an exhausted connections limit from holding the lock. It doesn't try to reduce chances of growth connections number, and there's no problem that the worker still accepts new connections.

Your suggestion can reduce the effective limit of worker connections up to 7/8 of it.

comment:2 by Shmulik Biran, 6 years ago

I see.
Then i miss understood the intention.
I thought the reason behind this feature is to reduce the chance to reach the connection limit, as long as there are other free workers - since a worker will accept a connection and drop it, if it has reached the connection limit.

However i still see a problem with this:

  • When a worker reaches the 7/8 connection threshold, its likelihood to accept new connections raises (since it accepts without holding the mutex).
  • In addition, a worker with no more free connections, will still accept a connection and then close it
  • Meaning we increase the chance for connections being dropped, when one worker's connection limit is getting close.

That said, the alternative i suggested will increase the connect time for all connections, when all workers reach 7/8 of their connection limit (since none will take the mutex for some time). I'm not sure this is better.

Perhaps the right approach is on a different direction - prevent a worker from accepting connections when it reached his max limit (*requires different handling when all workers are maxed out, if we want to reset connections when no one can handle them).

On a side note - when holding the accept mutex, the worker does minimal work (waiting on epoll, queuing events that were received, and handling only the "accept" events). So with that in mind, why do we need to prevent a worker with an exhausted connections limit from holding the lock?

in reply to:  2 comment:3 by Valentin V. Bartenev, 6 years ago

The main purpose of accept mutex is to solve "Thundering herd problem". Under high load the mutex contention shouldn't ever happen.

It's an abnormal situation when a worker process reaches connections limit, and usually it indicates misconfiguration or some dos attack. So it's expected that behaviour under this condition can be suboptimal.

Replying to שמוליק ביבי <shmulik.bibi@gmail.com>:

why do we need to prevent a worker with an exhausted connections limit from holding the lock?

The worker that is near to its limit shouldn't interrupt other workers from accepting connections.

comment:4 by Shmulik Biran, 6 years ago

OK, thanks for the detailed explanation.

comment:5 by Maxim Dounin, 6 years ago

Resolution: invalid
Status: newclosed

Closing this.

Note: See TracTickets for help on using tickets.