Opened 6 years ago
Last modified 6 years ago
#1695 new defect
Data is corrupted in static large objects (SLO swift API) when using nginx as a GET cache.
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | other | Version: | 1.14.x |
Keywords: | slice | Cc: | |
uname -a: | Linux host 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.14.0
built by gcc 6.2.1 20160916 (Red Hat 6.2.1-3) (GCC) Compiled by yourself with ngx_http_slice module |
Description
We found that data is corrupted in static large objects (SLO swift API) when using nginx as a GET cache.
Nginx used ngx_http_slice plugin.
Nginx download file as a slices using the "Content-Range" header.
At first it seemed to be missing a few bytes at the end of the full object.
Their number was one less than the number of slices for which the file was downloaded.
But the size of the final file was correct!
The analysis showed that in each slice all the headers are correct, but at the end of the data one extra byte!
It is clear that initially this is a ceph bug and we would not have noticed it if nginx compared the header and the number of received bytes.
Apparently nginx adds slices to the buffer without checking the match of the header and the data actually received,and the last slice either does not fit completely into the buffer or is cut off.
Bug in ceph tracer https://tracker.ceph.com/issues/37765
Attachments (5)
Change History (15)
by , 6 years ago
Attachment: | range-bug-with-SLO.png added |
---|
comment:1 by , 6 years ago
Keywords: | slice added |
---|---|
Priority: | critical → minor |
comment:2 by , 6 years ago
The statement is true, but the error log is empty.
comment:3 by , 6 years ago
The message should be there unless you are using higher logging level. Could you please check which level is used in the error_log directives in your configuration, and check that you see the message if the logging level is set at least to warn
?
comment:4 by , 6 years ago
I changed the logging level from error to warn.
Now I see this warning:....*1 upstream sent more data than specified in "Content-Length" header while reading upstream.....
But now, сannot reproduce the bug.
Nginx return 4 bytes, NOT 5. :(
Later my colleague will come, maybe he changed something without me. Clarify and accomplish your goal.
by , 6 years ago
comment:5 by , 6 years ago
We reproduced it. I attach three files: two pcap files and source file.
We build next schema: ceph(centos) -> nginx(centos) -> curl(win)
In nginx set slice 64 bytes.
Test file:
*** 001 ... 999 ***
Only 4004 bytes.
It is seen that ceph gives the wrong packages.
And also, that nginx gives incorrectly glued file.
The last bytes are still cut off by the client (cURL).
comment:7 by , 6 years ago
Thanks for checking. Corruption of the response produced by the slice module is expected when the backend returns malformed responses, as the slice module relies on correct parts being returned and simply merges them together. The warning confirms that responses from the backend are indeed malformed, thanks for checking this.
The current nginx behaviour is to simply pass backend responses as is, even malformed (with a warning logged and keepalive disabled with the backend). This was done to make sure a client will be able to see the response returned, even if it is malformed, and act accordingly. And this interacts badly with the way how the slice module works, as it simply merges all responses with extra bytes in them together.
We may consider switching to a more restrictive behaviour. In particular, dropping extra bytes (or even generating an error) might be a better approach. The following patch will do this for proxy (though probably it needs to be extended to other modules as well):
# HG changeset patch # User Maxim Dounin <mdounin@mdounin.ru> # Date 1545916641 -10800 # Thu Dec 27 16:17:21 2018 +0300 # Node ID d160baa105264d50bf5e233558b6ca1caff17ca2 # Parent 6d15e452fa2eaf19408e24a0d0fcc3a31344a289 Proxy: drop extra data sent by upstream. Previous behaviour was to pass everything to the client, but this seems to be suboptimal (ticket #1695). diff --git a/src/http/modules/ngx_http_proxy_module.c b/src/http/modules/ngx_http_proxy_module.c --- a/src/http/modules/ngx_http_proxy_module.c +++ b/src/http/modules/ngx_http_proxy_module.c @@ -2042,22 +2042,31 @@ ngx_http_proxy_copy_filter(ngx_event_pip return NGX_OK; } - p->length -= b->last - b->pos; - - if (p->length == 0) { - r = p->input_ctx; - p->upstream_done = 1; - r->upstream->keepalive = !r->upstream->headers_in.connection_close; - - } else if (p->length < 0) { - r = p->input_ctx; - p->upstream_done = 1; - - ngx_log_error(NGX_LOG_WARN, r->connection->log, 0, + if (p->length > b->last - b->pos) { + p->length -= b->last - b->pos; + return NGX_OK; + } + + if (b->last - b->pos > p->length) { + + ngx_log_error(NGX_LOG_WARN, p->log, 0, "upstream sent more data than specified in " "\"Content-Length\" header"); + + b->last = b->pos + p->length; + + p->length = 0; + p->upstream_done = 1; + + return NGX_OK; } + r = p->input_ctx; + + p->length = 0; + p->upstream_done = 1; + r->upstream->keepalive = !r->upstream->headers_in.connection_close; + return NGX_OK; }
comment:10 by , 6 years ago
The particular patch changes upstream response handling to strict response length checking instead of previous behaviour to trust the backend. This is fairy big change in general, and needs to be expanded to other protocols before it can be committed. I have some work-in-progress patches for this, but these patches aren't finished yet.
Note well that the proper solution to address the particular problem your are seeing with your backend is to fix the backend. While the change suggested can improve interoperability with such buggy backends, it is the backend which is buggy and returns malformed responses.
So, the backend returns a 206 response with:
Content-Range
header which corresponds to the range requested by nginx.Content-Length
header which corresponds to the range.Content-Length
.And this causes malformed response when using the slice module. Correct?
Could you please check nginx error log for warnings like:
If the analysis is correct, there should be such warnings logged by nginx.