Opened 10 years ago
Closed 10 years ago
#539 closed enhancement (wontfix)
On certain kernel version epoll_release_file() hangs. Suggest manually EPOLL_CTL_DEL before close()ing
Reported by: | Zhe Yang | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | nginx-core | Version: | 1.4.x |
Keywords: | epoll del_connection | Cc: | |
uname -a: | Linux localhost 3.13.5-1-ARCH #1 SMP PREEMPT Wed Dec 4 21:45:42 CET 2013 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.4.4
TLS SNI support enabled configure arguments: --prefix=/etc/nginx --conf-path=/etc/nginx/nginx.conf --sbin-path=/usr/sbin/nginx --pid-path=/var/run/nginx.pid --lock-path=/var/lock/nginx.lock --user=nobody --group=nobody --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --http-client-body-temp-path=/var/lib/nginx/client-body --http-proxy-temp-path=/var/lib/nginx/proxy --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-pcre-jit --with-file-aio --with-ipv6 --with-debug --with-http_geoip_module --with-http_gzip_static_module --with-http_realip_module --with-http_ssl_module --with-http_spdy_module --with-google_perftools_module --with-http_stub_status_module --add-module=../../../../src/ngx_modules/ngx_cache_purge --add-module=../../../../src/ngx_modules/ngx_devel_kit --add-module=../../../../src/ngx_modules/lua-nginx-module --add-module=../../../../src/ngx_modules/echo-nginx-module --add-module=../../../../src/ngx_modules/ngx_http_consistent_hash |
Description
Hello,
On kernel 3.13.5 / 3.13.8, nginx often hangs in D state. It's wchan when it's hang is epoll_release_file(). Kernel calls epoll_release_file() if a fd is still being watched by epoll when it's close()ing.
It seems to be a kernel bug, and I've also filed a bug on kernel's bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=73711 . But I suggest to disable event/modules/ngx_epoll_module.c:540, because saving this kernel call before close() may not worth. It may also make the kernel busy locking the global epmutex, and in some situation (e.g. bug in kernel), crash the system.
Change History (3)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
Interesting, it seems this global mutex significantly changed from kernel to kernel.
In older ones epmutex
is hold only for:
ep_free()
(closing epoll fd, doesn't affect nginx)eventpoll_release_file()
(affects nginx)
In February of 2011, around 2.6.38, there was a commit: epoll: prevent creating circular epoll structures, that resulted in one more case of holding the mutex:
ep_free()
eventpoll_release_file()
epoll_ctl(epoll_fd, EPOLL_CTL_ADD, epoll_fd2)
(adding one epoll descriptor into another doesn't affect nginx)
But later, in January 2012 (Linux 3.3), epoll: limit paths was committed, which should lead to a lot more epmutex
contention:
ep_free()
eventpoll_release_file()
EPOLL_CTL_ADD
of any fd (affects nginx)EPOLL_CTL_DEL
of any fd (also affects nginx)
And this has remained unchanged till November of 2013, kernel version 3.13, when two changes were committed: epoll: optimize EPOLL_CTL_DEL using rcu and epoll: do not take global 'epmutex' for simple topologies, which improved the situation:
ep_free()
eventpoll_release_file()
epoll_ctl(epoll_fd, EPOLL_CTL_ADD, epoll_fd2)
EPOLL_CTL_ADD
in more than one epoll simultaneously (may affect nginx while it registers listen sockets, but it seems not too much)
comment:3 by , 10 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
As per performance tests done by Konstantin Pavlov, removing the optimization results in measurable (about 2%) performance degradation with kernels > 3.3 and < 3.13, and no measurable difference with linux kernel 3.14. It looks like removing the optimization isn't a good idea.
It would be interesting to see some performance numbers. Locking epmutex in the close() codepath doesn't looks good, and an extra epoll_ctl() may actually worth it.