Opened 6 years ago

Closed 4 years ago

#1695 closed defect (fixed)

Data is corrupted in static large objects (SLO swift API) when using nginx as a GET cache.

Reported by: greenx@… Owned by:
Priority: minor Milestone:
Component: other Version: 1.14.x
Keywords: slice Cc:
uname -a: Linux host 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.14.0
built by gcc 6.2.1 20160916 (Red Hat 6.2.1-3) (GCC)
Compiled by yourself with ngx_http_slice module

Description

We found that data is corrupted in static large objects (SLO swift API) when using nginx as a GET cache.
Nginx used ngx_http_slice plugin.
Nginx download file as a slices using the "Content-Range" header.
At first it seemed to be missing a few bytes at the end of the full object.
Their number was one less than the number of slices for which the file was downloaded.
But the size of the final file was correct!
The analysis showed that in each slice all the headers are correct, but at the end of the data one extra byte!

It is clear that initially this is a ceph bug and we would not have noticed it if nginx compared the header and the number of received bytes.

Apparently nginx adds slices to the buffer without checking the match of the header and the data actually received,and the last slice either does not fit completely into the buffer or is cut off.

Bug in ceph tracer https://tracker.ceph.com/issues/37765

Attachments (5)

range-bug-with-SLO.png (199.5 KB ) - added by greenx@… 6 years ago.
ceph-nginx.pcap (112.6 KB ) - added by greenx@… 6 years ago.
trafic from ceph to nginx
nginx-client.pcap (5.7 KB ) - added by greenx@… 6 years ago.
trafic from nginx to client
file (3.9 KB ) - added by greenx@… 6 years ago.
bugceph.tar.gz (8.4 KB ) - added by greenx@… 6 years ago.
content cache directory

Download all attachments as: .zip

Change History (20)

by greenx@…, 6 years ago

Attachment: range-bug-with-SLO.png added

comment:1 by Maxim Dounin, 6 years ago

Keywords: slice added
Priority: criticalminor

So, the backend returns a 206 response with:

  1. Content-Range header which corresponds to the range requested by nginx.
  2. Content-Length header which corresponds to the range.
  3. An additional byte at the end, not counted in Content-Length.

And this causes malformed response when using the slice module. Correct?

Could you please check nginx error log for warnings like:

2018/12/26 19:11:35 [warn] 47862#100072: *1 upstream sent more data than specified in "Content-Length" header while reading upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "http://127.0.0.1:8090/"

If the analysis is correct, there should be such warnings logged by nginx.

comment:2 by greenx@…, 6 years ago

The statement is true, but the error log is empty.

Version 0, edited 6 years ago by greenx@… (next)

comment:3 by Maxim Dounin, 6 years ago

The message should be there unless you are using higher logging level. Could you please check which level is used in the error_log directives in your configuration, and check that you see the message if the logging level is set at least to warn?

comment:4 by greenx@…, 6 years ago

I changed the logging level from error to warn.
Now I see this warning:....*1 upstream sent more data than specified in "Content-Length" header while reading upstream.....
But now, сannot reproduce the bug.
Nginx return 4 bytes, NOT 5. :(
Later my colleague will come, maybe he changed something without me. Clarify and accomplish your goal.

by greenx@…, 6 years ago

Attachment: ceph-nginx.pcap added

trafic from ceph to nginx

by greenx@…, 6 years ago

Attachment: nginx-client.pcap added

trafic from nginx to client

by greenx@…, 6 years ago

Attachment: file added

comment:5 by greenx@…, 6 years ago

We reproduced it. I attach three files: two pcap files and source file.
We build next schema: ceph(centos) -> nginx(centos) -> curl(win)
In nginx set slice 64 bytes.
Test file:

***
001
...
999
***

Only 4004 bytes.
It is seen that ceph gives the wrong packages.
And also, that nginx gives incorrectly glued file.
The last bytes are still cut off by the client (cURL).

Last edited 6 years ago by greenx@… (previous) (diff)

by greenx@…, 6 years ago

Attachment: bugceph.tar.gz added

content cache directory

comment:6 by greenx@…, 6 years ago

I added an archive with a cache directory with all parts of the file.

comment:7 by Maxim Dounin, 6 years ago

Thanks for checking. Corruption of the response produced by the slice module is expected when the backend returns malformed responses, as the slice module relies on correct parts being returned and simply merges them together. The warning confirms that responses from the backend are indeed malformed, thanks for checking this.

The current nginx behaviour is to simply pass backend responses as is, even malformed (with a warning logged and keepalive disabled with the backend). This was done to make sure a client will be able to see the response returned, even if it is malformed, and act accordingly. And this interacts badly with the way how the slice module works, as it simply merges all responses with extra bytes in them together.

We may consider switching to a more restrictive behaviour. In particular, dropping extra bytes (or even generating an error) might be a better approach. The following patch will do this for proxy (though probably it needs to be extended to other modules as well):

# HG changeset patch
# User Maxim Dounin <mdounin@mdounin.ru>
# Date 1545916641 -10800
#      Thu Dec 27 16:17:21 2018 +0300
# Node ID d160baa105264d50bf5e233558b6ca1caff17ca2
# Parent  6d15e452fa2eaf19408e24a0d0fcc3a31344a289
Proxy: drop extra data sent by upstream.

Previous behaviour was to pass everything to the client, but this
seems to be suboptimal (ticket #1695).

diff --git a/src/http/modules/ngx_http_proxy_module.c b/src/http/modules/ngx_http_proxy_module.c
--- a/src/http/modules/ngx_http_proxy_module.c
+++ b/src/http/modules/ngx_http_proxy_module.c
@@ -2042,22 +2042,31 @@ ngx_http_proxy_copy_filter(ngx_event_pip
         return NGX_OK;
     }
 
-    p->length -= b->last - b->pos;
-
-    if (p->length == 0) {
-        r = p->input_ctx;
-        p->upstream_done = 1;
-        r->upstream->keepalive = !r->upstream->headers_in.connection_close;
-
-    } else if (p->length < 0) {
-        r = p->input_ctx;
-        p->upstream_done = 1;
-
-        ngx_log_error(NGX_LOG_WARN, r->connection->log, 0,
+    if (p->length > b->last - b->pos) {
+        p->length -= b->last - b->pos;
+        return NGX_OK;
+    }
+
+    if (b->last - b->pos > p->length) {
+
+        ngx_log_error(NGX_LOG_WARN, p->log, 0,
                       "upstream sent more data than specified in "
                       "\"Content-Length\" header");
+
+        b->last = b->pos + p->length;
+
+        p->length = 0;
+        p->upstream_done = 1;
+
+        return NGX_OK;
     }
 
+    r = p->input_ctx;
+
+    p->length = 0;
+    p->upstream_done = 1;
+    r->upstream->keepalive = !r->upstream->headers_in.connection_close;
+
     return NGX_OK;
 }
 

comment:8 by greenx@…, 6 years ago

The patch helped - it works as expected. Thank!

comment:9 by greenx@…, 6 years ago

Something else needs to be done, what changes have hit the mainstream?

comment:10 by Maxim Dounin, 6 years ago

The particular patch changes upstream response handling to strict response length checking instead of previous behaviour to trust the backend. This is fairy big change in general, and needs to be expanded to other protocols before it can be committed. I have some work-in-progress patches for this, but these patches aren't finished yet.

Note well that the proper solution to address the particular problem your are seeing with your backend is to fix the backend. While the change suggested can improve interoperability with such buggy backends, it is the backend which is buggy and returns malformed responses.

comment:11 by Maxim Dounin <mdounin@…>, 4 years ago

In 7675:9afa45068b8f/nginx:

Proxy: drop extra data sent by upstream.

Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.

comment:12 by Maxim Dounin <mdounin@…>, 4 years ago

In 7678:bffcc5af1d72/nginx:

Upstream: drop extra data sent by upstream.

Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.

This change covers generic buffered and unbuffered filters as used
in the scgi and uwsgi modules. Appropriate input filter init
handlers are provided by the scgi and uwsgi modules to set corresponding
lengths.

Note that for responses to HEAD requests there is an exception:
we do allow any response length. This is because responses to HEAD
requests might be actual full responses, and it is up to nginx
to remove the response body. If caching is enabled, only full
responses matching the Content-Length header will be cached
(see b779728b180c).

comment:13 by Maxim Dounin <mdounin@…>, 4 years ago

In 7679:05e42236e95b/nginx:

FastCGI: protection from responses with wrong length.

Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Fix is to
drop extra data instead, as it naturally happens in most clients.

Additionally, we now also issue a warning if the response is too
short, and make sure the fact it is truncated is propagated to the
client. The u->error flag is introduced to make it possible to
propagate the error to the client in case of unbuffered proxying.

For responses to HEAD requests there is an exception: we do allow
both responses without body and responses with body matching the
Content-Length header.

comment:14 by Maxim Dounin <mdounin@…>, 4 years ago

In 7680:39501ce97e29/nginx:

gRPC: generate error when response size is wrong.

As long as the "Content-Length" header is given, we now make sure
it exactly matches the size of the response. If it doesn't,
the response is considered malformed and must not be forwarded
(https://tools.ietf.org/html/rfc7540#section-8.1.2.6). While it
is not really possible to "not forward" the response which is already
being forwarded, we generate an error instead, which is the closest
equivalent.

Previous behaviour was to pass everything to the client, but this
seems to be suboptimal and causes issues (ticket #1695). Also this
directly contradicts HTTP/2 specification requirements.

Note that the new behaviour for the gRPC proxy is more strict than that
applied in other variants of proxying. This is intentional, as HTTP/2
specification requires us to do so, while in other types of proxying
malformed responses from backends are well known and historically
tolerated.

comment:15 by Maxim Dounin, 4 years ago

Resolution: fixed
Status: newclosed

Should be fixed for all protocols now.

Note: See TracTickets for help on using tickets.