Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#1816 closed defect (invalid)

Worker process loops infinitely in ngx_rbtree_min

Reported by: Novex@… Owned by:
Priority: minor Milestone:
Component: other Version: 1.17.x
Keywords: Cc:
uname -a: Linux csp-gateway-66bb475765-jxlpq 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1 (2019-04-12) x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.17.1
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)
built with OpenSSL 1.0.2k-fips 26 Jan 2017
TLS SNI support enabled
configure arguments: --prefix=/opt/nginx --user=nobody --group=nobody --error-log-path=/dev/stderr --http-log-path=/dev/stdout --pid-path=/var/run/nginx/nginx.pid --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-http_auth_request_module --with-threads --with-stream --with-stream_ssl_module --with-http_slice_module --with-mail --with-mail_ssl_module --with-file-aio --with-http_v2_module --with-ipv6 --without-http_fastcgi_module --without-http_memcached_module --without-http_scgi_module --without-http_uwsgi_module --without-http_ssi_module --add-module=/root/csp-gateway

Description

I found an nginx worker on one of our production sites pegged at 100% CPU. I dumped the core and ended up with this backtrace of the looping thread:

#0  ngx_rbtree_min (sentinel=0x70c840 <ngx_event_timer_sentinel>, node=0x1c20778) at src/core/ngx_rbtree.h:77
#1  ngx_event_find_timer () at src/event/ngx_event_timer.c:45
#2  0x00000000004328ff in ngx_process_events_and_timers (cycle=cycle@entry=0x1be0d20) at src/event/ngx_event.c:204
#3  0x0000000000439dc4 in ngx_worker_process_cycle (cycle=0x1be0d20, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:750
#4  0x0000000000438502 in ngx_spawn_process (cycle=cycle@entry=0x1be0d20, proc=proc@entry=0x439d53 <ngx_worker_process_cycle>, data=data@entry=0x0, name=name@entry=0x4c4855 "worker process", respawn=respawn@entry=-3)
    at src/os/unix/ngx_process.c:199
#5  0x000000000043908b in ngx_start_worker_processes (cycle=cycle@entry=0x1be0d20, n=1, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:359
#6  0x000000000043a4c1 in ngx_master_process_cycle (cycle=cycle@entry=0x1be0d20) at src/os/unix/ngx_process_cycle.c:131
#7  0x0000000000414d95 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:382

It looks like the sentinel that it's looking for doesn't exist in the node list so it just loops infinitely.

(gdb) p sentinel
$1 = (ngx_rbtree_node_t *) 0x70c840 <ngx_event_timer_sentinel>
(gdb) p node
$2 = (ngx_rbtree_node_t *) 0x1c20778
(gdb) p node->left
$3 = (ngx_rbtree_node_t *) 0x1c21318
(gdb) p node->left->left
$4 = (ngx_rbtree_node_t *) 0x1c20dd8
(gdb) p node->left->left->left
$5 = (ngx_rbtree_node_t *) 0x1c20778
(gdb) p node->left->left->left->left
$6 = (ngx_rbtree_node_t *) 0x1c21318
(gdb) p *sentinel
$7 = {key = 0, left = 0x0, right = 0x0, parent = 0x1c20b38, color = 0 '\000', data = 0 '\000'}
(gdb) p *node
$8 = {key = 100311128, left = 0x1c21318, right = 0x1c20e98, parent = 0x1c20dd8, color = 0 '\000', data = 0 '\000'}
(gdb) p *node->left
$9 = {key = 100313700, left = 0x1c20dd8, right = 0x1c20d78, parent = 0x1c20778, color = 1 '\001', data = 0 '\000'}
(gdb) p *node->left->left
$10 = {key = 100276631, left = 0x1c20778, right = 0x1c21498, parent = 0x1c21258, color = 0 '\000', data = 0 '\000'}
(gdb) p *node->left->left->left
$11 = {key = 100311128, left = 0x1c21318, right = 0x1c20e98, parent = 0x1c20dd8, color = 0 '\000', data = 0 '\000'}

Please let me know if there's any more information you need. So far I haven't been able to reproduce this behaviour in our test environments.

Change History (4)

comment:1 by Maxim Dounin, 5 years ago

Are you able to reproduce the problem without any 3rd party modules?

comment:2 by Novex@…, 5 years ago

No, I haven't seen this before. It's a bit tricky because we rely on the third party module for our application deployment so we can't run without it in the environment this is happening in.

The third-party module code seemed to be running in a different thread which wasn't infinitely looping. Could it have possibly affected this part of nginx?

If you have any theories on how it could have messed with the node list I can pass it on to the vendor - they're quite responsive.

comment:3 by Maxim Dounin, 5 years ago

Resolution: invalid
Status: newclosed

The third-party module code seemed to be running in a different thread

Running anything in threads - except some simple and well controlled code, like calling single syscalls as nginx does with aio threads - is very likely to cause serious problems, as nginx does not try to limit itself only to thread-safe functions. Further, running almost any nginx functions in threads - will certainly break things. In particular, if the 3rd party module tries to add/remove nginx timers from a thread, the result is somewhat expected.

comment:4 by Novex@…, 5 years ago

Thank you for the information - appreciate your time.

Note: See TracTickets for help on using tickets.