Opened 6 years ago

Last modified 6 years ago

#1695 new defect

Data is corrupted in static large objects (SLO swift API) when using nginx as a GET cache.

Reported by: greenx@… Owned by:
Priority: minor Milestone:
Component: other Version: 1.14.x
Keywords: slice Cc:
uname -a: Linux host 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.14.0
built by gcc 6.2.1 20160916 (Red Hat 6.2.1-3) (GCC)
Compiled by yourself with ngx_http_slice module

Description

We found that data is corrupted in static large objects (SLO swift API) when using nginx as a GET cache.
Nginx used ngx_http_slice plugin.
Nginx download file as a slices using the "Content-Range" header.
At first it seemed to be missing a few bytes at the end of the full object.
Their number was one less than the number of slices for which the file was downloaded.
But the size of the final file was correct!
The analysis showed that in each slice all the headers are correct, but at the end of the data one extra byte!

It is clear that initially this is a ceph bug and we would not have noticed it if nginx compared the header and the number of received bytes.

Apparently nginx adds slices to the buffer without checking the match of the header and the data actually received,and the last slice either does not fit completely into the buffer or is cut off.

Bug in ceph tracer https://tracker.ceph.com/issues/37765

Attachments (5)

range-bug-with-SLO.png (199.5 KB ) - added by greenx@… 6 years ago.
ceph-nginx.pcap (112.6 KB ) - added by greenx@… 6 years ago.
trafic from ceph to nginx
nginx-client.pcap (5.7 KB ) - added by greenx@… 6 years ago.
trafic from nginx to client
file (3.9 KB ) - added by greenx@… 6 years ago.
bugceph.tar.gz (8.4 KB ) - added by greenx@… 6 years ago.
content cache directory

Download all attachments as: .zip

Change History (15)

by greenx@…, 6 years ago

Attachment: range-bug-with-SLO.png added

comment:1 by Maxim Dounin, 6 years ago

Keywords: slice added
Priority: criticalminor

So, the backend returns a 206 response with:

  1. Content-Range header which corresponds to the range requested by nginx.
  2. Content-Length header which corresponds to the range.
  3. An additional byte at the end, not counted in Content-Length.

And this causes malformed response when using the slice module. Correct?

Could you please check nginx error log for warnings like:

2018/12/26 19:11:35 [warn] 47862#100072: *1 upstream sent more data than specified in "Content-Length" header while reading upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "http://127.0.0.1:8090/"

If the analysis is correct, there should be such warnings logged by nginx.

comment:2 by greenx@…, 6 years ago

The statement is true, but the error log is empty.

Version 0, edited 6 years ago by greenx@… (next)

comment:3 by Maxim Dounin, 6 years ago

The message should be there unless you are using higher logging level. Could you please check which level is used in the error_log directives in your configuration, and check that you see the message if the logging level is set at least to warn?

comment:4 by greenx@…, 6 years ago

I changed the logging level from error to warn.
Now I see this warning:....*1 upstream sent more data than specified in "Content-Length" header while reading upstream.....
But now, сannot reproduce the bug.
Nginx return 4 bytes, NOT 5. :(
Later my colleague will come, maybe he changed something without me. Clarify and accomplish your goal.

by greenx@…, 6 years ago

Attachment: ceph-nginx.pcap added

trafic from ceph to nginx

by greenx@…, 6 years ago

Attachment: nginx-client.pcap added

trafic from nginx to client

by greenx@…, 6 years ago

Attachment: file added

comment:5 by greenx@…, 6 years ago

We reproduced it. I attach three files: two pcap files and source file.
We build next schema: ceph(centos) -> nginx(centos) -> curl(win)
In nginx set slice 64 bytes.
Test file:

***
001
...
999
***

Only 4004 bytes.
It is seen that ceph gives the wrong packages.
And also, that nginx gives incorrectly glued file.
The last bytes are still cut off by the client (cURL).

Last edited 6 years ago by greenx@… (previous) (diff)

by greenx@…, 6 years ago

Attachment: bugceph.tar.gz added

content cache directory

comment:6 by greenx@…, 6 years ago

I added an archive with a cache directory with all parts of the file.

comment:7 by Maxim Dounin, 6 years ago

Thanks for checking. Corruption of the response produced by the slice module is expected when the backend returns malformed responses, as the slice module relies on correct parts being returned and simply merges them together. The warning confirms that responses from the backend are indeed malformed, thanks for checking this.

The current nginx behaviour is to simply pass backend responses as is, even malformed (with a warning logged and keepalive disabled with the backend). This was done to make sure a client will be able to see the response returned, even if it is malformed, and act accordingly. And this interacts badly with the way how the slice module works, as it simply merges all responses with extra bytes in them together.

We may consider switching to a more restrictive behaviour. In particular, dropping extra bytes (or even generating an error) might be a better approach. The following patch will do this for proxy (though probably it needs to be extended to other modules as well):

# HG changeset patch
# User Maxim Dounin <mdounin@mdounin.ru>
# Date 1545916641 -10800
#      Thu Dec 27 16:17:21 2018 +0300
# Node ID d160baa105264d50bf5e233558b6ca1caff17ca2
# Parent  6d15e452fa2eaf19408e24a0d0fcc3a31344a289
Proxy: drop extra data sent by upstream.

Previous behaviour was to pass everything to the client, but this
seems to be suboptimal (ticket #1695).

diff --git a/src/http/modules/ngx_http_proxy_module.c b/src/http/modules/ngx_http_proxy_module.c
--- a/src/http/modules/ngx_http_proxy_module.c
+++ b/src/http/modules/ngx_http_proxy_module.c
@@ -2042,22 +2042,31 @@ ngx_http_proxy_copy_filter(ngx_event_pip
         return NGX_OK;
     }
 
-    p->length -= b->last - b->pos;
-
-    if (p->length == 0) {
-        r = p->input_ctx;
-        p->upstream_done = 1;
-        r->upstream->keepalive = !r->upstream->headers_in.connection_close;
-
-    } else if (p->length < 0) {
-        r = p->input_ctx;
-        p->upstream_done = 1;
-
-        ngx_log_error(NGX_LOG_WARN, r->connection->log, 0,
+    if (p->length > b->last - b->pos) {
+        p->length -= b->last - b->pos;
+        return NGX_OK;
+    }
+
+    if (b->last - b->pos > p->length) {
+
+        ngx_log_error(NGX_LOG_WARN, p->log, 0,
                       "upstream sent more data than specified in "
                       "\"Content-Length\" header");
+
+        b->last = b->pos + p->length;
+
+        p->length = 0;
+        p->upstream_done = 1;
+
+        return NGX_OK;
     }
 
+    r = p->input_ctx;
+
+    p->length = 0;
+    p->upstream_done = 1;
+    r->upstream->keepalive = !r->upstream->headers_in.connection_close;
+
     return NGX_OK;
 }
 

comment:8 by greenx@…, 6 years ago

The patch helped - it works as expected. Thank!

comment:9 by greenx@…, 6 years ago

Something else needs to be done, what changes have hit the mainstream?

comment:10 by Maxim Dounin, 6 years ago

The particular patch changes upstream response handling to strict response length checking instead of previous behaviour to trust the backend. This is fairy big change in general, and needs to be expanded to other protocols before it can be committed. I have some work-in-progress patches for this, but these patches aren't finished yet.

Note well that the proper solution to address the particular problem your are seeing with your backend is to fix the backend. While the change suggested can improve interoperability with such buggy backends, it is the backend which is buggy and returns malformed responses.

Note: See TracTickets for help on using tickets.