Opened 3 weeks ago

Closed 3 weeks ago

Last modified 3 weeks ago

#2052 closed defect (invalid)

Slice module closes connection on upstream error codes 4xx and 5xx other than 416

Reported by: halaei@… Owned by:
Priority: major Milestone:
Component: nginx-core Version: 1.19.x
Keywords: slice Cc:
uname -a: Linux vivo 4.15.0-115-generic #116-Ubuntu SMP Wed Aug 26 14:04:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.19.2
built by gcc 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
built with OpenSSL 1.1.1 11 Sep 2018 (running with OpenSSL 1.1.1g 21 Apr 2020)
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.19.2/debian/debuild-base/nginx-1.19.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'

Description

Follwing instructions in https://docs.nginx.com/nginx/admin-guide/content-cache/content-caching/#slice, I enabled slice caching with a config similar to the following:

location / {
    slice             1m;
    proxy_cache       cache;
    proxy_cache_key   $uri$is_args$args$slice_range;
    proxy_set_header  Range $slice_range;
    proxy_cache_valid 200 206 1h;
    proxy_pass        http://localhost:8000;
}

I setup a mock upstream to send me response based on the provided headers and query params:

  1. "l" query param: length of the file
  2. "x-err" header: if presented the server respond with an error code and a json body instead of returning a 200/206 status code with a file in the body.

First I send a request to get the first slice of 2MB file:

curl -X GET http://mycache.test/file?l=2000000 -H 'Range: bytes=0-1000000'

This causes nginx to successfully request for the first 1MB of the file, cache it and send the response back to the client.

Now if I send a request to get the whole file but set 'x-err' header to 403, the upstream respond with 403 for the second slice of the file, but nginx can't handle this situation and closes the connection without sending the error back to the client or clearing the cached files.

curl -X GET http://mycache.test/file>l=2000000 -H 'x-err: 403

My expectation would be for the server to either send the available range, or send 403 while clearing the cached slices depending on the situation (e.g. partial 206 response if available slices covers the beginning of the requested range and 403 otherwise).

Currently, I have no choice but to disable slice in my configuration because this scenario is highly likely to happen for me in production, which is an AWS CloudFront/S3 upstream serving signed URLs to private files.

Change History (6)

comment:1 by Maxim Dounin, 3 weeks ago

Resolution: invalid
Status: newclosed

nginx can't handle this situation and closes the connection

Closing the connection is exactly how nginx handles this situation. The slice module is designed to request a file by limited size slices, and the response headers are returned along with the first slice. As long as the response headers are already sent to the client, it is no longer possible to return an error to the client. Instead, nginx closes the connection to make sure the client won't assume the (unfinished) response is complete, and logs the error to the error log.

As for the clearing the cached files, it is up to you to do appropriate cleaning (or, rather, configure caching so responses which shouldn't be used are not cached or expire), this is not something the slice module does or expected to do.

Note well that the slice module converts an atomic operation (a request) to a non-atomic series of operations (multiple subrequests). It is only expected to be used with stable setups, when files are not changed between the requests (and "between" includes caching time, if caching is used).

Currently, I have no choice but to disable slice in my configuration

As long as in your configuration upstream responses are not stable, this is the only choice. You shouldn't enable the slice module in such configurations in the first place.

comment:2 by halaei@…, 3 weeks ago

Thanks for the quick reply. But I disagree with you. I see Cloud-Front acts very differently (and correctly in my opinion) in the similar scenario, so I don't know why it shouldn't be acceptable for Nginx to do likewise.

I ran a the similar test with Cloud-Front, which is like my Nginx setup, a reversed proxy with caching and slice being 1 MB. The behaviour of Cloud-Front is more acceptable to me than Nginx, because Cloud-Front actually returns 403 if its upstream respond with 403 to partial requests. Here is my test in Cloud-Front and S3 as its upstream:

  1. Save a public multi megabyte file in a S3 bucket.
  2. curl https://cloudfrontdomain/file -H 'Range: bytes=0-99' -v. The response will be a 100 bytes file with status 206 and with header of x-cache: Miss.
  3. Delete the file from S3 so that when you ask for the file again directly from S3, you will get a 403 error.
  4. curl https://cloudfrontdomain/file -v will result in a 403 error code with some xml body, so Cloud-Front doesn't drop the connection without sending any response, as Nginx does.
  5. To check that Cloud-Front indeed uses slices of size 1MB send curl https://cloudfrontdomain/file -H 'Range: bytes=0-1000000' -v to verify that you receive a 206 response with x-cache: Hit from cloud-front but if you go beyond the first MB, e.g. by setting Range to bytes=1000000-2000000 you will get a 403.

Also I have a problem with your answer, so I am afraid there is a misunderstanding here:

As long as the response headers are already sent to the client, it is no longer possible to return an error to the client.

I didn't ask to set the status code after the response headers are sent. The request for the second chunk of the file has nothing to do with the request for the first chunk, and it receives no response header at all. It event may be from a different client!

The best response for a cache is the response that the upstream would generate, a 403 in this case.

I really hope you can reconsider this issue and reopen it.
Tanks again.

comment:3 by halaei@…, 3 weeks ago

Resolution: invalid
Status: closedreopened

comment:4 by halaei@…, 3 weeks ago

Maybe here is what is wrong with closing the connection without sending error response: the client has no way to find out if the request is invalid and it shouldn't retry or there is a network error and it should retry.

in reply to:  2 comment:5 by Maxim Dounin, 3 weeks ago

Resolution: invalid
Status: reopenedclosed
  1. curl https://cloudfrontdomain/file -v will result in a 403 error code with some xml body, so Cloud-Front doesn't drop the connection without sending any response, as Nginx does.

This might simply indicate that it only uses slice-based requests only for range requests, but not for full resource requests. Try instead a request which covers two slices, and make sure that:

  1. The first slice is returned from the cache (without a request to upstream) before an upstream request to the second slice is made (so no upstream request will happen, e.g., if the client will drop the connection early).
  2. The request to the second slice returns 403 from the upstream, so CloudFront have to do something with the fact that it already send the response headers to the client, but have no second slice to return.

Either way, the fact that CloudFront in some setup behaves better in your configuration doesn't indicate that there is a bug in nginx. As explained above, nginx doesn't try to remove any cached files, and it's up to you to ensure caches are only used correctly.

Note well that using very short cache validity time along with proxy_cache_revalidate might work for you, and will result in a similar behaviour at a cost of a revalidation request for each slice returned.

Also I have a problem with your answer, so I am afraid there is a misunderstanding here:

As long as the response headers are already sent to the client, it is no longer possible to return an error to the client.

I didn't ask to set the status code after the response headers are sent. The request for the second chunk of the file has nothing to do with the request for the first chunk, and it receives no response header at all. It event may be from a different client!

The second request is expected to receive a response: nginx will return the response headers and the first slice from the cache. Once the second slice is received from the upstream server, it will detect the error, write appropriate error message to the error log and will close the connection to indicate the error to the client.

For example:

$ curl -svo /dev/null http://127.0.0.1:8080/t/1000m -H 'Range: bytes=0-1000000'
*   Trying 127.0.0.1:8080...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /t/1000m HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.72.0
> Accept: */*
> Range: bytes=0-1000000
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 206 Partial Content
< Server: nginx/1.19.3
< Date: Sun, 27 Sep 2020 20:14:37 GMT
< Content-Type: text/plain
< Content-Length: 1000001
< Connection: keep-alive
< Last-Modified: Mon, 22 Jun 2020 08:35:14 GMT
< ETag: "5ef06d42-3e800000"
< X-Cache: HIT
< Content-Range: bytes 0-1000000/1048576000
< 
{ [16036 bytes data]
* Connection #0 to host 127.0.0.1 left intact
$ curl -svo /dev/null http://127.0.0.1:8080/t/1000m -H 'X-Err: 1'
*   Trying 127.0.0.1:8080...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /t/1000m HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.72.0
> Accept: */*
> X-Err: 1
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.19.3
< Date: Sun, 27 Sep 2020 20:14:40 GMT
< Content-Type: text/plain
< Content-Length: 1048576000
< Connection: keep-alive
< Last-Modified: Mon, 22 Jun 2020 08:35:14 GMT
< ETag: "5ef06d42-3e800000"
< X-Cache: HIT
< Accept-Ranges: bytes
< 
{ [16067 bytes data]
* transfer closed with 1047527424 bytes remaining to read
* Closing connection 0

Note "transfer closed with 1047527424 bytes remaining to read" error from curl. At the same time, nginx error log will contain something like:

2020/09/27 23:14:40 [error] 88539#100117: *19 unexpected status code 403 in slice response while reading response header from upstream, client: 127.0.0.1, server: , request: "GET /t/1000m HTTP/1.1", subrequest: "/t/1000m", upstream: "http://127.0.0.1:8081/t/1000m", host: "127.0.0.1:8080"

comment:6 by halaei@…, 3 weeks ago

I tested with Postman and Guzzle client. They didn't show the server response, so I thought that maybe Nginx closed the connection before sending any response. I guess I was wrong.

Here is the response of postman:

Could not get any response
There was an error connecting to https://....
Why this might have happened:
The server couldn't send a response:
Ensure that the backend is working properly
Self-signed SSL certificates are being blocked:
Fix this by turning off 'SSL certificate verification' in Settings > General
Proxy configured incorrectly
Ensure that proxy is configured correctly in Settings > Proxy
Request timeout:
Change request timeout in Settings > General

And this is the error from Guzzle:

GuzzleHttp\Exception\ConnectException  : cURL error 52: Empty reply from server (see https://curl.haxx.se/libcurl/c/libcurl-errors.html)

Probably before sending status code and headers to the client, Cloudfont waits for the missing slices in the cache to be received from upstream (or at least it waits for the upstream status codes), and that is why it can send 403 back to the client.

Note: See TracTickets for help on using tickets.