#1816 closed defect (invalid)
Worker process loops infinitely in ngx_rbtree_min
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | other | Version: | 1.17.x |
Keywords: | Cc: | ||
uname -a: | Linux csp-gateway-66bb475765-jxlpq 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1 (2019-04-12) x86_64 x86_64 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.17.1
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) built with OpenSSL 1.0.2k-fips 26 Jan 2017 TLS SNI support enabled configure arguments: --prefix=/opt/nginx --user=nobody --group=nobody --error-log-path=/dev/stderr --http-log-path=/dev/stdout --pid-path=/var/run/nginx/nginx.pid --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-http_auth_request_module --with-threads --with-stream --with-stream_ssl_module --with-http_slice_module --with-mail --with-mail_ssl_module --with-file-aio --with-http_v2_module --with-ipv6 --without-http_fastcgi_module --without-http_memcached_module --without-http_scgi_module --without-http_uwsgi_module --without-http_ssi_module --add-module=/root/csp-gateway |
Description
I found an nginx worker on one of our production sites pegged at 100% CPU. I dumped the core and ended up with this backtrace of the looping thread:
#0 ngx_rbtree_min (sentinel=0x70c840 <ngx_event_timer_sentinel>, node=0x1c20778) at src/core/ngx_rbtree.h:77 #1 ngx_event_find_timer () at src/event/ngx_event_timer.c:45 #2 0x00000000004328ff in ngx_process_events_and_timers (cycle=cycle@entry=0x1be0d20) at src/event/ngx_event.c:204 #3 0x0000000000439dc4 in ngx_worker_process_cycle (cycle=0x1be0d20, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:750 #4 0x0000000000438502 in ngx_spawn_process (cycle=cycle@entry=0x1be0d20, proc=proc@entry=0x439d53 <ngx_worker_process_cycle>, data=data@entry=0x0, name=name@entry=0x4c4855 "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199 #5 0x000000000043908b in ngx_start_worker_processes (cycle=cycle@entry=0x1be0d20, n=1, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:359 #6 0x000000000043a4c1 in ngx_master_process_cycle (cycle=cycle@entry=0x1be0d20) at src/os/unix/ngx_process_cycle.c:131 #7 0x0000000000414d95 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:382
It looks like the sentinel that it's looking for doesn't exist in the node list so it just loops infinitely.
(gdb) p sentinel $1 = (ngx_rbtree_node_t *) 0x70c840 <ngx_event_timer_sentinel> (gdb) p node $2 = (ngx_rbtree_node_t *) 0x1c20778 (gdb) p node->left $3 = (ngx_rbtree_node_t *) 0x1c21318 (gdb) p node->left->left $4 = (ngx_rbtree_node_t *) 0x1c20dd8 (gdb) p node->left->left->left $5 = (ngx_rbtree_node_t *) 0x1c20778 (gdb) p node->left->left->left->left $6 = (ngx_rbtree_node_t *) 0x1c21318
(gdb) p *sentinel $7 = {key = 0, left = 0x0, right = 0x0, parent = 0x1c20b38, color = 0 '\000', data = 0 '\000'} (gdb) p *node $8 = {key = 100311128, left = 0x1c21318, right = 0x1c20e98, parent = 0x1c20dd8, color = 0 '\000', data = 0 '\000'} (gdb) p *node->left $9 = {key = 100313700, left = 0x1c20dd8, right = 0x1c20d78, parent = 0x1c20778, color = 1 '\001', data = 0 '\000'} (gdb) p *node->left->left $10 = {key = 100276631, left = 0x1c20778, right = 0x1c21498, parent = 0x1c21258, color = 0 '\000', data = 0 '\000'} (gdb) p *node->left->left->left $11 = {key = 100311128, left = 0x1c21318, right = 0x1c20e98, parent = 0x1c20dd8, color = 0 '\000', data = 0 '\000'}
Please let me know if there's any more information you need. So far I haven't been able to reproduce this behaviour in our test environments.
Change History (4)
comment:1 by , 5 years ago
comment:2 by , 5 years ago
No, I haven't seen this before. It's a bit tricky because we rely on the third party module for our application deployment so we can't run without it in the environment this is happening in.
The third-party module code seemed to be running in a different thread which wasn't infinitely looping. Could it have possibly affected this part of nginx?
If you have any theories on how it could have messed with the node list I can pass it on to the vendor - they're quite responsive.
comment:3 by , 5 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
The third-party module code seemed to be running in a different thread
Running anything in threads - except some simple and well controlled code, like calling single syscalls as nginx does with aio threads
- is very likely to cause serious problems, as nginx does not try to limit itself only to thread-safe functions. Further, running almost any nginx functions in threads - will certainly break things. In particular, if the 3rd party module tries to add/remove nginx timers from a thread, the result is somewhat expected.
Are you able to reproduce the problem without any 3rd party modules?