Opened 12 years ago

Closed 7 years ago

Last modified 7 years ago

#215 closed defect (invalid)

SSL: decryption failed or bad record mac with upstream servers

Reported by: internetstaff.myopenid.com Owned by: somebody
Priority: major Milestone:
Component: nginx-core Version: 1.2.x
Keywords: Cc:
uname -a: Linux myserver 3.2.28-45.62.amzn1.x86_64 #1 SMP Wed Aug 22 03:09:00 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.2.3
built by gcc 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC)
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx/ --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-http_ssl_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_stub_status_module --with-mail --with-mail_ssl_module --with-file-aio --with-ipv6 --with-cc-opt='-O2 -g'

Description

2012/09/07 20:23:52 [error] 3417#0: *1 SSL_read() failed (SSL: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac) while reading upstream, client: 1.2.3.4, server: _, request: "GET /512K.bin HTTP/1.1", upstream: "https://192.168.1.1:443/512K.bin", host: "test.com"

I can repeat this at will by requesting two 512k files simultaneously. It fails every time. The upstream servers are IIS 7.5.

I've dug into the network, SSL on both sides, looked at captures, upgraded and downgraded openssl and Nginx, etc.

In the end, I seem to have worked around it with:

proxy_buffers 8 32k;

The number doesn't seem to matter, but any less than 32k and the issue repeats.

Is this something getting 'lost' and corrupting the SSL transfer if the buffer isn't large enough?

Change History (15)

comment:1 by Juan Hoyos, 11 years ago

I can reproduce this issue with three simultaneous requests to the same (~200k) resource. The response is gziped and truncated.

2013/03/22 18:03:23 [error] 11395#0: *195839 SSL_read() failed (SSL: error:1408F119:SSL
routines:SSL3_GET_RECORD:decryption failed or bad record mac) while reading upstream, client:
123.213.123.123, server: my.host.com, request: "GET /api/resource/1234 HTTP/1.1", upstream:
"https://127.0.0.1:4443/resource1234?", host: "my.host.com", referrer:
"https://my.host.com/resource/1234"`

Single 1.2.4 installation on CentOS, configured with a front server proxying to an SSL upstream server.

Fixed as OP mentioned.

Last edited 11 years ago by Juan Hoyos (previous) (diff)

comment:2 by Ziga Mahkovec, 11 years ago

We're seeing the same issue. We have nginx talking to another upstream nginx over https (both 1.2.7).

Changing proxy_buffer size seems to help, but occasionally we still see a large response (4MB+) causing this issue.

We've eliminated a few things while debugging this:

  • keepalive upstream (we tried with and without)
  • http 1.0 and http 1.1
  • with and without range request support
  • with and without SSL session reuse
  • different number of workers/upstream servers

comment:3 by Michel Samia, 10 years ago

Please paste here the minimal config which is able to reproduce this issue. I had a similar issue when proxy_buffering was off and send_file was on and downloading the same file twice in parallel, it closed one connection prematurely (only a small part of file was transferred). When I disabled send_file it started to work correctly.

comment:4 by Agent Coulson, 10 years ago

I am seeing the same problem, here is my config. my upstream server has been obfuscated, but i am able to reproduce with consistency.

### Begin nginx.conf ###

worker_processes 1;

error_log logs/error.log debug;

pid logs/nginx.pid;

events {

worker_connections 1024;

}

http {

include mime.types;
default_type application/octet-stream;

access_log logs/access.log;

keepalive_timeout 60;

upstream http {

server upstream.srv:443;
keepalive 512;

}

server {

listen 1182 default_server;

server_name -;

ssl_protocols SSLv3 TLSv1;
ssl_ciphers RC4:HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;

location / {

proxy_pass https://http;

proxy_redirect off;
proxy_read_timeout 10s;
proxy_connect_timeout 6s;


proxy_buffering off;
proxy_buffer_size 64k;
proxy_buffers 6 16k;
proxy_busy_buffers_size 80k;

proxy_pass_header Server;
proxy_pass_header Date;
proxy_pass_header X-Pad;

proxy_set_header Connection "Keep-Alive";
proxy_set_header Host "upstream.srv";

}

}

}

### End nginx.conf ###

comment:5 by Maxim Dounin, 10 years ago

sensitive: 0

Just for record, see this thread for additional information. This seems to be a bug in OpenSSL 1.0.0+ related to SSL_MODE_RELEASE_BUFFERS, needs additional investigation.

comment:6 by Aleksey Samsonov, 10 years ago

I found a bug in OpenSSL 1.0.0+ This patch solved a problem on my tests.
I reproduce this issue on nginx changeset 64d4837c9541 (OpenSSL commit f3a3903)

Last edited 10 years ago by Aleksey Samsonov (previous) (diff)

comment:8 by Maxim Dounin, 10 years ago

Just for record:

This seems to be already fixed by many/most major OS vendors (at least OpenBSD, FreeBSD, Debian and Ubuntu were reported; most notably, Red Hat seems missing), and will be fixed in next OpenSSL releases on all affected branches.

comment:9 by Maxim Dounin, 10 years ago

Resolution: invalid
Status: newclosed

The OpenSSL 1.0.1h with the fix is out, closing this.

comment:10 by shifty35@…, 7 years ago

Resolution: invalid
Status: closedreopened

I'm experiencing this same issue again using the latest nginx (1.11.5) and the latest OpenSSL (1.1.0b) while reverse proxying websockets.

I'm assuming this is a regression in OpenSSL, as using OpenSSL 1.0.1x doesn't fail in the same way?

Also very difficult to track down, as there are zero error logs about the closed connections. Nginx error log must be put into debug to see the ssl_read() failure.

comment:11 by Maxim Dounin, 7 years ago

Resolution: invalid
Status: reopenedclosed

As you can see directly from the ticket description, SSL_read() errors are logged at the error level. And yes, as long as previous versions of the OpenSSL library doesn't fail, this is a regression in OpenSSL and should be reported to OpenSSL, reopening this ticket doesn't make sense.

comment:12 by bblack.wikimedia.org@…, 7 years ago

Probably deserves a new ticket at least, but in any case: I've observed the same error as reported recently above (nginx 1.11.x + OpenSSL-1.1.0b, revproxy use-case). No websockets here, normal HTTP/2 client traffic into an HTTP/1.1 upstream backend. It takes a fair amount of bytes and/or HTTP/2 streams to trigger reliably (my testcase has been a page that loads ~500 images over an HTTP/2 connection, totaling several megabytes).

It's not reported at the "error" level. For me it's reported at the "info" level (so debug-mode isn't necessary to see it, but still...):

2016/10/27 18:00:39 [info] 44966#44966: *132434 SSL_read() failed (SSL: error:1408F119:SSL routines:ssl3_get_record:decryption failed or bad re
cord mac) while processing HTTP/2 connection, client: ...

I've done a ton of testing with tweaking various nginx parameters related to buffering and buffer sizes, but I'm always able to reproduce the issue to varying degrees on live servers. ssl_buffer_size seems to have a notable impact on the bug behavior (it gets much harder to repro at smaller sizes). We've had reports of the issue from a wide variety of disparate client browsers.

For the moment I'm assuming an OpenSSL-1.1.x regression, but it may be one that depends on exactly how nginx is using the API for buffer management and such, and may be fixable on the nginx side? We'll try to get some deeper/real debugging done on a reproduction soon.

comment:13 by Maxim Dounin, 7 years ago

It's not reported at the "error" level. For me it's reported at the "info" level...

In your case errors are reported at the "info" level as these are errors happening on a connection to a client. Such errors can be easily triggered by an incorrect client behaviour and hence are reported at the "info" level. The initial report was about errors happening on SSL connections to upstream servers, hence the difference.

comment:14 by bblack.wikimedia.org@…, 7 years ago

Update: taking some cues from older OpenSSL bug reports, I tried commenting out nginx's calls to SSL_CTX_set_mode(ssl->ctx, SSL_MODE_RELEASE_BUFFERS) and SSL_CTX_set_read_ahead(ssl->ctx, 1), and this stopped my bug repro. I think that more-firmly puts this in OpenSSL bug territory and starts giving some ideas where to look...

comment:15 by bblack.wikimedia.org@…, 7 years ago

FYI in case anyone else searches up this ticket, filed upstream @OpenSSL: https://github.com/openssl/openssl/issues/1799

Note: See TracTickets for help on using tickets.