Opened 3 weeks ago

Last modified 10 days ago

#1695 new defect

Data is corrupted in static large objects (SLO swift API) when using nginx as a GET cache.

Reported by: greenx@… Owned by:
Priority: minor Milestone:
Component: other Version: 1.14.x
Keywords: slice Cc:
uname -a: Linux host 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.14.0 built by gcc 6.2.1 20160916 (Red Hat 6.2.1-3) (GCC) Compiled by yourself with ngx_http_slice module

Description

We found that data is corrupted in static large objects (SLO swift API) when using nginx as a GET cache.
Nginx used ngx_http_slice plugin.
Nginx download file as a slices using the "Content-Range" header.
At first it seemed to be missing a few bytes at the end of the full object.
Their number was one less than the number of slices for which the file was downloaded.
But the size of the final file was correct!
The analysis showed that in each slice all the headers are correct, but at the end of the data one extra byte!

It is clear that initially this is a ceph bug and we would not have noticed it if nginx compared the header and the number of received bytes.

Apparently nginx adds slices to the buffer without checking the match of the header and the data actually received,and the last slice either does not fit completely into the buffer or is cut off.

Bug in ceph tracer https://tracker.ceph.com/issues/37765

Attachments (5)

range-bug-with-SLO.png (199.5 KB) - added by greenx@… 3 weeks ago.
ceph-nginx.pcap (112.6 KB) - added by greenx@… 3 weeks ago.
trafic from ceph to nginx
nginx-client.pcap (5.7 KB) - added by greenx@… 3 weeks ago.
trafic from nginx to client
file (3.9 KB) - added by greenx@… 3 weeks ago.
bugceph.tar.gz (8.4 KB) - added by greenx@… 3 weeks ago.
content cache directory

Download all attachments as: .zip

Change History (13)

Changed 3 weeks ago by greenx@…

comment:1 Changed 3 weeks ago by mdounin

  • Keywords slice added
  • Priority changed from critical to minor

So, the backend returns a 206 response with:

  1. Content-Range header which corresponds to the range requested by nginx.
  2. Content-Length header which corresponds to the range.
  3. An additional byte at the end, not counted in Content-Length.

And this causes malformed response when using the slice module. Correct?

Could you please check nginx error log for warnings like:

2018/12/26 19:11:35 [warn] 47862#100072: *1 upstream sent more data than specified in "Content-Length" header while reading upstream, client: 127.0.0.1, server: , request: "GET / HTTP/1.0", upstream: "http://127.0.0.1:8090/"

If the analysis is correct, there should be such warnings logged by nginx.

comment:2 Changed 3 weeks ago by greenx@…

The statement is true, but the error log is empty.
Maybe it is not clear from my description, and I missed that we use the module: https://nginx.org/ru/docs/http/ngx_http_slice_module.html

Last edited 3 weeks ago by greenx@… (previous) (diff)

comment:3 Changed 3 weeks ago by mdounin

The message should be there unless you are using higher logging level. Could you please check which level is used in the error_log directives in your configuration, and check that you see the message if the logging level is set at least to warn?

comment:4 Changed 3 weeks ago by greenx@…

I changed the logging level from error to warn.
Now I see this warning:....*1 upstream sent more data than specified in "Content-Length" header while reading upstream.....
But now, сannot reproduce the bug.
Nginx return 4 bytes, NOT 5. :(
Later my colleague will come, maybe he changed something without me. Clarify and accomplish your goal.

Changed 3 weeks ago by greenx@…

trafic from ceph to nginx

Changed 3 weeks ago by greenx@…

trafic from nginx to client

Changed 3 weeks ago by greenx@…

comment:5 Changed 3 weeks ago by greenx@…

We reproduced it. I attach three files: two pcap files and source file.
We build next schema: ceph(centos) -> nginx(centos) -> curl(win)
In nginx set slice 64 bytes.
Test file:

***
001
...
999
***

Only 4004 bytes.
It is seen that ceph gives the wrong packages.
And also, that nginx gives incorrectly glued file.
The last bytes are still cut off by the client (cURL).

Last edited 3 weeks ago by greenx@… (previous) (diff)

Changed 3 weeks ago by greenx@…

content cache directory

comment:6 Changed 3 weeks ago by greenx@…

I added an archive with a cache directory with all parts of the file.

comment:7 Changed 3 weeks ago by mdounin

Thanks for checking. Corruption of the response produced by the slice module is expected when the backend returns malformed responses, as the slice module relies on correct parts being returned and simply merges them together. The warning confirms that responses from the backend are indeed malformed, thanks for checking this.

The current nginx behaviour is to simply pass backend responses as is, even malformed (with a warning logged and keepalive disabled with the backend). This was done to make sure a client will be able to see the response returned, even if it is malformed, and act accordingly. And this interacts badly with the way how the slice module works, as it simply merges all responses with extra bytes in them together.

We may consider switching to a more restrictive behaviour. In particular, dropping extra bytes (or even generating an error) might be a better approach. The following patch will do this for proxy (though probably it needs to be extended to other modules as well):

# HG changeset patch
# User Maxim Dounin <mdounin@mdounin.ru>
# Date 1545916641 -10800
#      Thu Dec 27 16:17:21 2018 +0300
# Node ID d160baa105264d50bf5e233558b6ca1caff17ca2
# Parent  6d15e452fa2eaf19408e24a0d0fcc3a31344a289
Proxy: drop extra data sent by upstream.

Previous behaviour was to pass everything to the client, but this
seems to be suboptimal (ticket #1695).

diff --git a/src/http/modules/ngx_http_proxy_module.c b/src/http/modules/ngx_http_proxy_module.c
--- a/src/http/modules/ngx_http_proxy_module.c
+++ b/src/http/modules/ngx_http_proxy_module.c
@@ -2042,22 +2042,31 @@ ngx_http_proxy_copy_filter(ngx_event_pip
         return NGX_OK;
     }
 
-    p->length -= b->last - b->pos;
-
-    if (p->length == 0) {
-        r = p->input_ctx;
-        p->upstream_done = 1;
-        r->upstream->keepalive = !r->upstream->headers_in.connection_close;
-
-    } else if (p->length < 0) {
-        r = p->input_ctx;
-        p->upstream_done = 1;
-
-        ngx_log_error(NGX_LOG_WARN, r->connection->log, 0,
+    if (p->length > b->last - b->pos) {
+        p->length -= b->last - b->pos;
+        return NGX_OK;
+    }
+
+    if (b->last - b->pos > p->length) {
+
+        ngx_log_error(NGX_LOG_WARN, p->log, 0,
                       "upstream sent more data than specified in "
                       "\"Content-Length\" header");
+
+        b->last = b->pos + p->length;
+
+        p->length = 0;
+        p->upstream_done = 1;
+
+        return NGX_OK;
     }
 
+    r = p->input_ctx;
+
+    p->length = 0;
+    p->upstream_done = 1;
+    r->upstream->keepalive = !r->upstream->headers_in.connection_close;
+
     return NGX_OK;
 }
 

comment:8 Changed 10 days ago by greenx@…

The patch helped - it works as expected. Thank!

Note: See TracTickets for help on using tickets.