Opened 13 months ago

Closed 8 months ago

#1638 closed defect (worksforme)

Nginx Systemd is failing to start on reboot

Reported by: saeohioalpha@… Owned by:
Priority: major Milestone: 1.15.4
Component: other Version: 1.15.x
Keywords: systemd service restart reboot Cc:
uname -a: Linux web00 4.17.17-x86_64-linode116 #1 SMP PREEMPT Mon Aug 20 16:07:40 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.15.3 built by gcc 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10) built with OpenSSL 1.1.1 11 Sep 2018 TLS SNI support enabled configure arguments: --with-openssl=/tmp/openssl-1.1.1/ --add-module=/tmp/incubator-pagespeed-ngx-1.13.35.2-beta --add-module=/tmp/headers-more-nginx-module --add-module=/tmp/ngx_brotli --prefix=/opt/nginx --with-http_ssl_module --with-http_slice_module --with-http_gzip_static_module --with-cc-opt=-Wno-error --with-http_sub_module --with-http_stub_status_module --with-http_flv_module --with-http_xslt_module --with-http_image_filter_module --with-mail --with-mail_ssl_module --with-http_mp4_module --with-http_v2_module

Description

I recently updated my linode VM (UB16.04.5, 4.17.17-x86_64-linode116) to use nginx/1.15.3, openssl/1.1.1, and my systemd config started to not start on reboot. Systemd config was from the nginx.com website example.

It was timing out on the prestart. I removed the prestart, and then it started timing out on the start.

I adjusting the targets to make it start later in the process, still timed out. Then I added bin/sleep 15 to the prestart. Still timed out.

Sleep 30 also timed out, but allowed me to check the status and it looks like it times out after about 2 minutes of trying to start.

https://user-images.githubusercontent.com/969559/45700816-842ff480-bb3b-11e8-8eee-c0b5e52aeed1.png

When I manually run it by the cmd line it starts and starts in about 30s. Which seems long, but well within the 2 minute time out.

https://user-images.githubusercontent.com/969559/45700938-dcff8d00-bb3b-11e8-8250-e48177a221ef.png

The previous version of nginx/openssl that was installed was nginx/1.13.12 openssl/1.1.0h This combination restarted nginx on reboot correctly.

Change History (7)

comment:1 Changed 13 months ago by saeohioalpha@…

There was also an issue here https://github.com/geerlingguy/ansible-role-nginx/issues/151?_pjax=%23js-repo-pjax-container

where they had a similar problem, but were looking at anisble as the culprit. I am not using ansible, so thats out for me.

comment:2 Changed 13 months ago by saeohioalpha@…

Also, this has happened to me on both of the Linode 16.04.5 VMs that I have updated to 1.15.3. I am going to hold off on updating anymore until we figure out what is up.

comment:3 Changed 13 months ago by mdounin

From the details you've provided it looks like nginx hangs on start for some reason. Given that you've upgraded OpenSSL library and are using multiple 3rd party modules, this might be due to various possible reasons.

First of all, I would recommend trying to run nginx by hand, e.g., sudo /opt/nginx/sbin/nginx -t - this might give you some additional information on what is going on. Running under gdb to see where it hangs might be also helpful.

Please also try compiling nginx without any 3rd party modules to see if it helps.

Also, could you please clarify why you are using --with-cc-opt=-Wno-error configure argument? It shouldn't be needed.

comment:4 Changed 13 months ago by joey2250@…

Sorry, I did not get a notification that I had a response here.

  1. When mentioning that I ran the systemd start by hand, I didn't actually say it, but I also had run the prestart commands which is the nginx -t command. It completes without issue.
  1. I have not used gdb before. I will have to look and see if I can incorporate using gdb during the system start in the systemd file. As I mentioned, it only happens during the system start. Anytime after I have logged in, it will start up and work.
  1. I will attempt to compile with only stock modules and see if that helps.
  1. I had inherited this build script which has been in use in some form since 0.9.x. During the initial compilations, we had problems with make failing on warnings in addition to errors. Doing some research we found that passing this option would allow us to continue to compile on warnings. It may not be needed anymore, I haven't tested.

comment:5 Changed 13 months ago by joey2250@…

Ive setup a clean system and installed 1.15.5. Based off of that and the sample html provided, It is starting up on boot now. Im not sure if it was something fixed, or something particular to my system and modules. But Now i will start the long process to build up in stages and see where it breaks! :(

comment:6 Changed 13 months ago by mdounin

OK, so the problem appears to be severe during the boot (> 2m startup times), but less severe (30s startup times) when the system is properly booted, right? My best guess would be that your configuration depends on hostnames being resolved during nginx startup, and there is something wrong with name resolution on your host. This may be a wide range of problems, from simple things like missing network dependencies in systemd service file to a flakey name resolution which fails from time to time causing long name resolution times leading to different startup times depending on various conditions.

First of all, I would recommend to make sure that all name servers configured in the system are working properly. Also, it might be a good idea to find out what nginx is doing during the start (including the start after the system is booted, as 30s startup time is too long for an average nginx configuration). Running "nginx -t" under something like strace -tt should provide enough details to understand where the time is spent.

comment:7 Changed 8 months ago by mdounin

  • Resolution set to worksforme
  • Status changed from new to closed

Feedback timeout.

Note: See TracTickets for help on using tickets.