Opened 5 years ago

Closed 5 years ago

#666 closed defect (invalid)

Nginx retry the failed http request not only from the servers defined inside the associated upstream, but also the upstream's name as well

Reported by: ryan.chan1201@… Owned by:
Priority: blocker Milestone:
Component: nginx-core Version: 1.7.x
Keywords: upstream Cc:
uname -a:
nginx -V: nginx version: nginx/1.7.7 built by gcc 4.6.3 (GCC) TLS SNI support enabled configure arguments: --with-http_ssl_module --with-http_stub_status_module --with-ipv6 --with-pcre=./pcre-8.34 --without-http_fastcgi_module --without-http_scgi_module --with-http_auth_request_module

Description

Please read the attached nginx.conf file.

This is what I want to achieve:

  • Nginx as a reverse proxy and load balancer
  • Redirect traffic to either 127.0.0.1:4001 or 127.0.0.1:4002, using round robin scheme
  • In the attached nginx config, I have defined a upstream group called "RESTfulFromLive", with server 127.0.0.1:4001 and 127.0.0.1:4002.

Expected behavior:

  • Scenario A: Both 127.0.0.1:4001 and 127.0.0.1:4002 are online
    • Requests are round robin to 127.0.0.1:4001 and 127.0.0.1:4002 correctly
    • Each requests returned 200.
  • Scenario B: Both 127.0.0.1:4001 and 127.0.0.1:4002 are offline
    • Requests are round robin to 127.0.0.1:4001 and 127.0.0.1:4002 correctly
    • Each requests returned 502.

Actual behavior:

  • Scenario A:
    • Work as expected/
  • Scenario B:
    • Basically there are two types of proxied requests:
      • 10.50.100.238 - [19/Nov/2014:18:48:39 +0800] "GET /Content/a.txt HTTP/1.1" 502 537 "-" "-" 0.001 127.0.0.1:4002, 127.0.0.1:4001 - 502, 502 "-" 0.000, 0.001
      • 10.50.100.238 - [19/Nov/2014:18:48:39 +0800] "GET /Content/a.txt HTTP/1.1" 502 537 "-" "-" 0.000 RESTfulFromLive - 502 "-" 0.000
    • First type of request is expected, Nginx retried the failed requests to the other servers defined in the upstream.
    • Second type of request is NOT expected, as RESTfulFromLive request is an upstream name, the request must be failed.

I have tried many different nginx configuration, and in general, this bug will be hit if nginx is configured to retry the failed proxied request (e.g. proxy_next_upstream is configured.), and all upstream servers returned error code. That's why I propose this is a blocker issue.

#worker_processes  auto;
pid  /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    # include  mime.types;
    default_type  application/octet-stream;

    sendfile  on;

    keepalive_timeout  65;

    upstream RESTfulFromLive {
        server 127.0.0.1:4001;
        server 127.0.0.1:4002;
    }

    log_format  timed_combined '$remote_addr $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_header" $request_time $upstream_addr $upstream_cache_status $upstream_status "$upstream_http_header" $upstream_response_time';

    access_log  /mnt/sda7/nmi/log/nginx-access.log timed_combined;
    error_log   /mnt/sda7/nmi/log/nginx-error.log warn;

    server {
        listen  80;
        server_name  localhost;

        location / {
            proxy_pass http://RESTfulFromLive;
            # proxy_next_upstream error timeout invalid_header http_403 http_500 http_502 http_503 http_504;
        }

        # redirect server error pages to the static page /50x.html
        #
        error_page  500 502 503 504  /50x.html;
        location = /50x.html {
            root  html;
        }
    }
}

Ryan Chan

Change History (8)

comment:1 follow-up: Changed 5 years ago by mdounin

  • Resolution set to invalid
  • Status changed from new to closed

The upstream name is logged when all configured servers are down.

comment:2 in reply to: ↑ 1 Changed 5 years ago by ryan.chan1201@…

Replying to Maxim Dounin:

The upstream name is logged when all configured servers are down.

No. It is not only logged, the actual request is submitted. This is the key.

But are you suggesting when the round robin hit the last server defined in upstream, the logger will log the upstream name?

What will be the behavior if I defined the max retry is 3, and number of upstream servers is 7? According to what you said, will nginx behave as following:

  • First request will be translated to: server1, server2, server3, 502, 502, 502
  • Second request will be translated to: server4, server5, server6, 502, 502, 502
  • Third request will be translated to: server7, server1, server2, RESTfulFromLive, 502, 502, 502
  • Fourth request will be translated to: server3, server4, server5, 502, 502, 502
  • ...

Ryan Chan

comment:3 Changed 5 years ago by ryan.chan1201@…

Now, following is my test nginx.conf

#worker_processes  auto;
pid  /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    # include  mime.types;
    default_type  application/octet-stream;

    sendfile  on;

    keepalive_timeout  65;

    upstream RESTfulFromLive {
        server 127.0.0.1:4001;
        server 127.0.0.1:4002;
        server 127.0.0.1:4003;
    }

    log_format  timed_combined '$remote_addr $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_header" $request_time $upstream_addr $upstream_cache_status $upstream_status "$upstream_http_header" $upstream_response_time';

    access_log  /mnt/sda7/nmi/log/nginx-access.log timed_combined;
    error_log   /mnt/sda7/nmi/log/nginx-error.log warn;

    server {
        listen  80;
        server_name  localhost;

        location / {
            proxy_pass http://RESTfulFromLive;
            proxy_next_upstream error timeout invalid_header http_403 http_500 http_502 http_503 http_504;
        }

        # redirect server error pages to the static page /50x.html
        #
        error_page  500 502 503 504  /50x.html;
        location = /50x.html {
            root  html;
        }
    }
}

And I have the server 4001-4003 turned on and explicitly return http error 504.

And I see the following in nginx log, are they expected?

10.50.100.238 - [19/Nov/2014:21:24:40 +0800] "GET /Content/a.txt HTTP/1.1" 504 49 "-" "-" 0.003 127.0.0.1:4003, 127.0.0.1:4002, 127.0.0.1:4001 - 504, 504, 504 "-" 0.001, 0.001, 0.001
10.50.100.238 - [19/Nov/2014:21:24:40 +0800] "GET /Content/a.txt HTTP/1.1" 502 537 "-" "-" 0.001 127.0.0.1:4001, RESTfulFromLive - 504, 502 "-" 0.001, 0.000
10.50.100.238 - [19/Nov/2014:21:24:40 +0800] "GET /Content/a.txt HTTP/1.1" 504 49 "-" "-" 0.003 127.0.0.1:4003, 127.0.0.1:4001, 127.0.0.1:4002 - 504, 504, 504 "-" 0.001, 0.001, 0.001
10.50.100.238 - [19/Nov/2014:21:24:40 +0800] "GET /Content/a.txt HTTP/1.1" 502 537 "-" "-" 0.001 127.0.0.1:4002, RESTfulFromLive - 504, 502 "-" 0.001, 0.000

Ryan Chan

comment:4 Changed 5 years ago by ryan.chan1201@…

  • Resolution invalid deleted
  • Status changed from closed to reopened

comment:5 Changed 5 years ago by mdounin

  • Resolution set to invalid
  • Status changed from reopened to closed

Again: the upstream name is logged when all configured servers are down. That is, when nginx tries to find a server for a request, but fails due to all servers considered down as per max_fails/fail_timeout parameters of the server directive. At the same time, the "no live upstreams" message will be logged to the error log and 502 response will be returned to the client.

comment:6 Changed 5 years ago by ryan.chan1201@…

So, in my case, am I correct to say since all the servers in my upstream reported failed (due to whatever code), so, it return 502. Can I say so?

But it is weird that say if i have 5 servers, all 5 servers are reporting 504 for example. The retry is 3 times, my first request reported 504 (since all 3 tried servers reported 504), but the second request reported 502 (as the remaining 2 servers reported 504, and no more server to attempt, thus reporting 502). How would you comment this?

Ryan Chan

comment:7 Changed 5 years ago by ryan.chan1201@…

  • Resolution invalid deleted
  • Status changed from closed to reopened

comment:8 Changed 5 years ago by mdounin

  • Resolution set to invalid
  • Status changed from reopened to closed

If you need help with understanding how nginx works, consider asking in the mailing list. Trac is a wrong place to ask questions, it's to track bugs.

Note: See TracTickets for help on using tickets.