Opened 8 years ago

Closed 8 years ago

#912 closed defect (invalid)

Worker process crashes

Reported by: elmicha@… Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.9.x
Keywords: gcc arm Cc:
uname -a: Linux rpi 4.1.15-v7+ #7 SMP PREEMPT Sun Dec 20 02:23:23 CET 2015 armv7l GNU/Linux
nginx -V: nginx version: nginx/1.9.12
built by gcc 4.9.2 (Raspbian 4.9.2-10)
built with OpenSSL 1.0.1k 8 Jan 2015
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/etc/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-http_auth_request_module --with-http_xslt_module=dynamic --with-http_image_filter_module=dynamic --with-http_geoip_module=dynamic --with-threads --with-stream --with-stream_ssl_module --with-http_slice_module --with-mail --with-mail_ssl_module --with-file-aio --with-http_v2_module --with-cc-opt='-g -O2 -fstack-protector-strong -Wformat -Werror=format-security' --with-ld-opt=-Wl,-z,relro --with-ipv6

Description

I'm using nginx on a Raspberry Pi, and proxy a websocket connection to Mosquitto. Until nginx 1.9.10 it worked, now with 1.9.12 it crashes (I skipped 1.9.11).

  location /mqtt/ {
    proxy_pass https://127.0.0.1:9002;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_buffering off;
  }

I built the package from http://nginx.org/packages/mainline/debian/ with "apt-get -b source nginx".

root@rpi:/tmp/nginx/cores# gdb /usr/sbin/nginx /tmp/nginx/cores/core
GNU gdb (Raspbian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/nginx...Reading symbols from /usr/lib/debug//usr/sbin/nginx...done.
done.
[New LWP 10218]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
Core was generated by `nginx: worker process                   '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x01459288 in ?? ()
(gdb) bt
#0  0x01459288 in ?? ()
#1  0x00066a84 in ngx_http_upstream_process_header (r=0x145c9a0, u=0x145d3a0) at src/http/ngx_http_upstream.c:2133
#2  0x00064a60 in ngx_http_upstream_handler (ev=<optimized out>) at src/http/ngx_http_upstream.c:1110
#3  0x00041928 in ngx_epoll_process_events (cycle=<optimized out>, timer=<optimized out>, flags=<optimized out>)
    at src/event/modules/ngx_epoll_module.c:822
#4  0x000397e4 in ngx_process_events_and_timers (cycle=cycle@entry=0x13c82e0) at src/event/ngx_event.c:242
#5  0x0003fa70 in ngx_worker_process_cycle (cycle=0x13c82e0, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:753
#6  0x0003e488 in ngx_spawn_process (cycle=cycle@entry=0x13c82e0, proc=0x7e0, data=0x6, name=0xbef60 "worker process", 
    respawn=respawn@entry=0) at src/os/unix/ngx_process.c:198
#7  0x00040bcc in ngx_reap_children (cycle=0x13c82e0) at src/os/unix/ngx_process_cycle.c:621
#8  ngx_master_process_cycle (cycle=cycle@entry=0x13c82e0) at src/os/unix/ngx_process_cycle.c:174
#9  0x0001f1a8 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:367
(gdb) q

Change History (8)

comment:1 by Maxim Dounin, 8 years ago

There were previous reports about compilation issues on Raspberry Pi, see ticket #748. Likely you are facing something similar.

You can try tracing this further with:

  • check if the problem still appears when you compile nginx yourself without any additional arguments (if the problem disappear, it's probably related to the package build system);
  • check if the problem disappear when compiling nginx 1.9.10 which previously worked for you (if the problem appears when you compile 1.9.10 too - it is probably caused by toolchain changes on your system, but not nginx changes);
  • check which nginx version (or, better yet, particular changeset) introduced the problem;

Note well this comment which suggests that the problem could be solved by using -fPIE -pie flags. This might work for you as well.

comment:2 by elmicha@…, 8 years ago

Thank you for your help!

The problem disappeared when I did your first point: configure without any arguments. I guess I will re-add the configure switches and see where it breaks again.

For the record, I could not reproduce the problem on x64 or i386, only on the Pi and in a ARM chroot (https://wiki.debian.org/RaspberryPi/qemu-user-static).

comment:3 by elmicha@…, 8 years ago

The original configure line from the package with -fPIE -pie added works:

./configure --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/etc/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-http_auth_request_module --with-http_xslt_module=dynamic --with-http_image_filter_module=dynamic --with-http_geoip_module=dynamic --with-threads --with-stream --with-stream_ssl_module --with-http_slice_module --with-mail --with-mail_ssl_module --with-file-aio --with-http_v2_module --with-cc-opt='-g -O2 -fPIE -pie -fstack-protector-strong -Wformat -Werror=format-security' --with-ld-opt=-Wl,-z,relro --with-ipv6

comment:4 by Maxim Dounin, 8 years ago

Thanks for the feedback. It would be interesting to see what exactly breaks it. I would suspect the following, in no particular order:

  • --with-cc-opt="-O2"
  • other --with-cc-opt and --with-ld-opt

Overall it looks like an ARM-related gcc bug triggered by a particular optimization level or other compiler options and/or particular code, something like this one. Trying some other gcc versions might also help to diagnose the problem.

comment:5 by elmicha@…, 8 years ago

-O1 instead of -O2 works.
-O2 without -fstack-protector-strong fails.

gcc version is 4.9.2 (Raspbian 4.9.2-10). There are apparently only older gccs in Raspbian.

I'm trying to build gcc-5.3.0, but it takes a long time (even in a qemu chroot https://wiki.debian.org/RaspberryPi/qemu-user-static) and the first attempts failed.

comment:6 by Maxim Dounin, 8 years ago

So it looks like something switched on with -O2 generates broken code. It might be also helpful to trace specific option then. Full list of options -O2 implies as per gcc docs:

          -fthread-jumps 
          -falign-functions  -falign-jumps 
          -falign-loops  -falign-labels 
          -fcaller-saves 
          -fcrossjumping 
          -fcse-follow-jumps  -fcse-skip-blocks 
          -fdelete-null-pointer-checks 
          -fdevirtualize -fdevirtualize-speculatively 
          -fexpensive-optimizations 
          -fgcse  -fgcse-lm  
          -fhoist-adjacent-loads 
          -finline-small-functions 
          -findirect-inlining 
          -fipa-sra 
          -fisolate-erroneous-paths-dereference 
          -foptimize-sibling-calls 
          -fpartial-inlining 
          -fpeephole2 
          -freorder-blocks  -freorder-functions 
          -frerun-cse-after-loop  
          -fsched-interblock  -fsched-spec 
          -fschedule-insns  -fschedule-insns2 
          -fstrict-aliasing -fstrict-overflow 
          -ftree-switch-conversion -ftree-tail-merge 
          -ftree-pre 
          -ftree-vrp

In particular it would be interesting to test if -fstrict-aliasing is a problem or not (that is, if -O2 -fno-strict-aliasing fixes things).

comment:7 by elmicha@…, 8 years ago

The original package works if it is compiled with gcc-5.3.0.

Using built-in specs.
COLLECT_GCC=/root/gcc-5.3.0/bin/gcc-5.3
COLLECT_LTO_WRAPPER=/root/gcc-5.3.0/libexec/gcc/arm-linux-gnueabihf/5.3.0/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../gcc-5.3.0/configure --prefix=/root/gcc-5.3.0 --enable-languages=c,c++ --program-suffix=-5.3 --enable-shared --enable-linker-build-id --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/5.3 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libitm --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --with-arch-directory=arm --enable-multiarch --disable-sjlj-exceptions --with-arch=armv6 --with-fpu=vfp --with-float=hard --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 5.3.0 (GCC)

nginx version: nginx/1.9.12
built by gcc 5.3.0 (GCC)
built with OpenSSL 1.0.1k 8 Jan 2015
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/etc/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-http_auth_request_module --with-http_xslt_module=dynamic --with-http_image_filter_module=dynamic --with-http_geoip_module=dynamic --with-threads --with-stream --with-stream_ssl_module --with-http_slice_module --with-mail --with-mail_ssl_module --with-file-aio --with-http_v2_module --with-cc-opt='-g -O2 -fstack-protector-strong -Wformat -Werror=format-security' --with-ld-opt=-Wl,-z,relro --with-ipv6

With gcc-4.9.2 again:

-O2 -fno-strict-aliasing: fails

Now I'm trying to find which -O2 -fno-... is the culprit.

Version 0, edited 8 years ago by elmicha@… (next)

comment:8 by Maxim Dounin, 8 years ago

Keywords: gcc arm added
Resolution: invalid
Status: newclosed

Ok, thanks for testing. So it clearly looks like an GCC bug. Most relevant to -foptimize-sibling-calls I was able to find is this bug.

Given it's already fixed in recent GCC versions and various workarounds are available (including not using -O2 or using -fno-optimize-sibling-calls you've identified) it probably doesn't make sense to debug any further.

Closing this, as this clearly isn't an nginx bug, and compile options used by nginx by default doesn't trigger it as well. Thank you for testing.

Note: See TracTickets for help on using tickets.