Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#1633 closed defect (invalid)

Zone configuration in upstream gives signal 11 on heavy loaded server

Reported by: https://stackoverflow.com/users/6797850/strangedata Owned by:
Priority: major Milestone:
Component: nginx-core Version: 1.15.x
Keywords: Cc:
uname -a: Linux frontend01-prd 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.15.3
built by gcc 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
built with OpenSSL 1.0.1f 6 Jan 2014
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie' --add-module=/opt/nginx-sources/nginx-upload-progress-module-master --add-module=/opt/nginx-sources/nginx-http-slice-master --add-module=/opt/nginx-sources/nginx-upload-module-2.255 --add-module=/opt/nginx-sources/nginx-goodies-nginx-sticky-module-ng-08a395c66e42 --add-module=/opt/nginx-sources/ngx_http_substitutions_filter_module-master --add-module=/opt/nginx-sources/ngx_dynamic_upstream-master

Description

I've been testing shared memory zone for an upstream configuration for a few days in a test environment without issues.

When the same configuration was applied to our production servers, Nginx worker process started dying almost immediately, with signal 11.

My upstream configuration looks like this:

upstream backend_servers {

zone backend_upstreams_zone 1m;
sticky name=route path=/ hash=index expires=30m;
server backend01-tst:28080;
server backend02-tst:28080;
server backend03-tst:28080;
server backend04-tst:28080;
server backend05-tst:28080;
server backend06-tst:28080;
server backend07-tst:28080;
server backend08-tst:28080;
server backend09-tst:28080;

}

I cloned the VM to another network, just to test the zone config in an exactly equal configuration, but without load the problem doesn't happen.

I tried also playing with the zone size, up to 1024m, without success.

Here are my env info:

Linux frontend01-prd 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

nginx version: nginx/1.15.3
built by gcc 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
built with OpenSSL 1.0.1f 6 Jan 2014
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie' --add-module=/opt/nginx-sources/nginx-upload-progress-module-master --add-module=/opt/nginx-sources/nginx-http-slice-master --add-module=/opt/nginx-sources/nginx-upload-module-2.255 --add-module=/opt/nginx-sources/nginx-goodies-nginx-sticky-module-ng-08a395c66e42 --add-module=/opt/nginx-sources/ngx_http_substitutions_filter_module-master --add-module=/opt/nginx-sources/ngx_dynamic_upstream-master

GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/nginx...Reading symbols from /usr/lib/debug/.build-id/00/6543010a42d5e472fb38a40895ce2f44b3e279.debug...done.
done.
BFD: Warning: /dados/coredumps/core-2018-09-11 is truncated: expected core file size >= 5505687552, found: 1073741824.
[New LWP 2364]
[New LWP 2365]
[New LWP 2366]
Failed to read a valid object file image from memory.
Core was generated by `nginx: worker process '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007feeebef6a99 in ngx_event_connect_peer (pc=0x7feeece3e9b8, pc@entry=<error reading variable: Cannot access memory at address 0x7fff742d4338>)

at src/event/ngx_event_connect.c:41

41 src/event/ngx_event_connect.c: No such file or directory.
(gdb) bt full
#0 0x00007feeebef6a99 in ngx_event_connect_peer (pc=0x7feeece3e9b8, pc@entry=<error reading variable: Cannot access memory at address 0x7fff742d4338>)

at src/event/ngx_event_connect.c:41

rc = <optimized out>
type = 1
port = <optimized out>
err = <optimized out>
level = <optimized out>
s = <optimized out>
rev = <optimized out>
wev = <optimized out>
c = <optimized out>

Cannot access memory at address 0x7fff742d4338

Unfortunately I cannot disable the modules to test in our production environment, and I cannot reproduce the issue without the heavy load we get in production. Even a 1000 threads JMeter isn't enough to trigger it.

Change History (3)

comment:1 by https://stackoverflow.com/users/6797850/strangedata, 6 years ago

Forgot to add the error.log message:

2018/09/11 12:57:22 [alert] 11028#11028: *1694380 open socket #666 left in connection 611
2018/09/11 12:57:22 [alert] 11030#11030: *1695373 open socket #1173 left in connection 1168
2018/09/11 12:57:22 [alert] 11030#11030: *1692687 open socket #1215 left in connection 1172
2018/09/11 12:57:22 [alert] 11030#11030: *1693428 open socket #1308 left in connection 1301
2018/09/11 12:57:22 [alert] 11028#11028: aborting
2018/09/11 12:57:22 [alert] 11030#11030: aborting
2018/09/11 12:57:22 [alert] 11025#11025: *1687552 open socket #153 left in connection 119
2018/09/11 12:57:22 [alert] 11025#11025: aborting
2018/09/11 12:57:22 [alert] 9794#9794: *1530843 open socket #53 left in connection 49
2018/09/11 12:57:22 [alert] 9794#9794: aborting
2018/09/11 12:57:22 [alert] 11026#11026: *1692200 open socket #341 left in connection 315
2018/09/11 12:57:22 [alert] 11026#11026: aborting
2018/09/11 12:57:22 [alert] 11031#11031: *1688028 open socket #651 left in connection 407
2018/09/11 12:57:22 [alert] 11031#11031: aborting
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12111 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12110 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12109 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12108 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12112 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12107 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12298 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12289 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12295 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12286 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12301 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12307 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12292 exited on signal 11
2018/09/11 12:59:42 [alert] 12106#12106: worker process 12114 exited on signal 11

comment:2 by Maxim Dounin, 6 years ago

Resolution: invalid
Status: newclosed

To work with upstreams in shared memory zone, all upstream-related modules involved needs to be compatible with such upstreams - use only standard structures, do proper locking and so on. As per the configuration and nginx -V output, you are using a 3rd party module, nginx-goodies-nginx-sticky-module-ng-08a395c66e42, which isn't compatible.

comment:3 by https://stackoverflow.com/users/6797850/strangedata, 6 years ago

Thank you for pointing that out. I'll talk to the sticky module developers.

Note: See TracTickets for help on using tickets.