Opened 7 months ago

Closed 6 months ago

#2285 closed defect (fixed)

strange "worker_connections are not enough, reusing connections" warning

Reported by: gabriel.dodan@… Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.19.x
Keywords: Cc:
uname -a: Linux 4.18.0-348.2.1.el8_5.x86_64 #1 SMP Tue Nov 16 14:42:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.20.2
built by gcc 8.5.0 20210514 (Red Hat 8.5.0-4) (GCC)
built with OpenSSL 1.1.1k FIPS 25 Mar 2021
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -pie'

Description

Hi,

I keep getting that warning, although according to nginx configuration i shouldn't. This is my configuration

user  nginx;
worker_processes  auto;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;


worker_rlimit_nofile 120000;
events {
    worker_connections  10000;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

The server has 12 cores , so according to nginx configuration the nginx server should handle up to 120000 connections. I have activated stub_status and it shows 6190 active connections . So I should no get that warning because active connections are way less than what nginx is configured to handle.

this is the exact warning i get

2021/11/26 13:50:23 [warn] 114001#114001: 10000 worker_connections are not enough, reusing connections

I use nginx mainly as proxy, proxied servers are websocket servers

Any idea?

Thanks and regards!

Change History (7)

comment:1 by Maxim Dounin, 7 months ago

The stub_status counters show amount of client connections, while worker_connections are also used for listening sockets and, most notably, for upstream connections. The message suggests that one of your worker processes used all or almost all of its 10000 worker_connections, and started to reuse keepalive connections.

Given the total number of worker processes, this suggests poor distribution of client connections between worker processes. Recent reports suggest that it might be bad with recent Linux kernels when using default settings, see this thread. There is an ongoing work to further analyse and improve the default configuration. Meanwhile, configuring accept_mutex on;, or using listen ... reuseport, or recompiling nginx with --with-cc-opt="-DNGX_HAVE_EPOLLEXCLUSIVE=0" should improve the distribution.

comment:2 by gabriel.dodan@…, 7 months ago

Thank you very much for the reply! I have turned on the accept_mutex , will watch what happens.

Is there any way to find how many connections are open on each worker process?

Regards!

comment:3 by Maxim Dounin, 7 months ago

Is there any way to find how many connections are open on each worker process?

Sure, by using normal system tools. On Linux, you can do so by using ss -ntp under root, it will print corresponding processes for each socket, or by using something like lsof -n -i (optionally with -p <pid> to see sockets from a specific worker process).

comment:4 by gabriel.dodan@…, 7 months ago

Works fine so far, no more warning. Thanks for the suggestion!

Best!

comment:5 by Maxim Dounin, 7 months ago

Thanks for the feedback. Keeping this open for now, to make sure the underlying problem is not forgotten.

Just in case, it would be great to check if recompiling with --with-cc-opt="-DNGX_HAVE_EPOLLEXCLUSIVE=0" also fixes things in your case, without using accept_mutex on;. If it does, this is probably what we should do by default.

comment:6 by Maxim Dounin <mdounin@…>, 6 months ago

In 7992:e2d07e4ec636/nginx:

Events: fixed balancing between workers with EPOLLEXCLUSIVE.

Linux with EPOLLEXCLUSIVE usually notifies only the process which was first
to add the listening socket to the epoll instance. As a result most of the
connections are handled by the first worker process (ticket #2285). To fix
this, we re-add the socket periodically, so other workers will get a chance
to accept connections.

comment:7 by Maxim Dounin, 6 months ago

Resolution: fixed
Status: newclosed

Underlying problem fixed in e2d07e4ec636, distribution of client connections between worker processes should be even with the default settings.

Note: See TracTickets for help on using tickets.