Opened 5 months ago

Closed 5 months ago

Last modified 5 months ago

#2364 closed defect (invalid)

SSL Sendfile Client Rate Limit Can Cause Incorrect Report of File Truncated

Reported by: noamc.qwilt.com@… Owned by:
Priority: minor Milestone:
Component: nginx-core Version: 1.22.x
Keywords: sendfil ssl Cc: noamc.qwilt.com@…
uname -a: Linux it.testing.com 4.18.0-80.7.1.el8_0.x86_64 #1 SMP Sat Aug 3 15:14:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.22.0
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
built with OpenSSL 3.0.1 14 Dec 2021
TLS SNI support enabled
configure arguments: --prefix=/nginx_sa/target --without-http_auth_basic_module --without-http_autoindex_module --without-http_empty_gif_module --without-http_fastcgi_module --without-http_memcached_module --without-http_referer_module --without-http_scgi_module --without-http_ssi_module --without-http_upstream_ip_hash_module --without-http_userid_module --without-http_uwsgi_module --with-http_stub_status_module --with-http_ssl_module --with-file-aio --with-http_flv_module --with-http_v2_module --with-pcre-jit --with-threads --with-cc-opt='-I /openssl-dev/include/ -D NGX_SSL_SENDFILE -D NGX_HAVE_VARIADIC_MACROS -DTCP_FASTOPEN=23 -Wno-deprecated-declarations' --with-ld-opt='-L /openssl-target/lib64' --with-debug

Description

OS: Centos 7
Using a simple nginx.conf (below), and the following curl command to represent a client having a limited network capacity, we're requesting a 10MB file:

curl https://10.9.8.18:443/video/test.mp4 -H "Host: lab.testing.com" -k -s -o /dev/null -v --limit-rate 4000k

The transaction would fail because at times SSL Sendfile might return zero. Example log:
2022/06/30 12:35:06 [debug] 16400#16400: *35 http write filter limit 1048576
2022/06/30 12:35:06 [debug] 16400#16400: *35 SSL to sendfile: @3982490 1051494
2022/06/30 12:35:06 [debug] 16400#16400: *35 SSL_sendfile: 475136
2022/06/30 12:35:06 [debug] 16400#16400: *35 SSL to sendfile: @4457626 576358
2022/06/30 12:35:06 [debug] 16400#16400: *35 SSL_sendfile: 32768
2022/06/30 12:35:06 [debug] 16400#16400: *35 SSL to sendfile: @4490394 543590
2022/06/30 12:35:06 [debug] 16400#16400: *35 SSL_sendfile: 0
2022/06/30 12:35:06 [alert] 16400#16400: *35 SSL_sendfile() reported that "/nginx_sa/cache/ac/3d/34b5a1830ef0eb9f5006181d27093dac" was truncated at 4490394, client: 10.9.25.52, server: lab.testing.com, request: "GET /video/test.mp4 HTTP/1.1", host: "lab.testing.com"

In truth, nothing really happens to the file. This is a result of the rate limit on the client.
We are able to work around it with a retry mechanism, allowing such transaction to complete successfully, usually within 1 retry.

We're doing something similar to:

if (!c->read->pending_eof) {
   c->write->delayed = 1;
   ngx_add_timer(c->write, 50);
   return NGX_DONE;
}

We believe that the scenario that triggers it is the sendbuf becoming almost full. Causing the ktls layer to not being able to write anything to it.

Nginx.conf:

daemon on;
master_process on;
worker_processes  16;

error_log  logs/error.log warn;

thread_pool disk-00 threads=4 max_queue=1024;

events {
     worker_connections  1024;
     use epoll;
}

http {
    aio threads=disk-00;
    sendfile on;
    sendfile_max_chunk 1024k;
    ssl_conf_command Options KTLS;
    tcp_nopush on;
      
    limit_rate 0k;

    ssl_certificate /nginx_sa/device.crt;
    ssl_certificate_key /nginx_sa/device.key;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3;
    ssl_ciphers AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:EECDH+AESGCM:EDH+AESGCM:DHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-SHA256:AES128-SHA:DES-CBC3-SHA:HIGH:!aNULL:!MD5;
    ssl_prefer_server_ciphers on;
    ssl_session_tickets off;
    ssl_session_cache shared:sslcache:20m;
    ssl_session_timeout 120m;
    
    proxy_max_temp_file_size 0;
    proxy_buffering on;
    proxy_ignore_client_abort off;
    proxy_http_version 1.1;
    proxy_ssl_server_name on;
 
    output_buffers 1 1024k;

    resolver 127.0.0.1:53;
    resolver_timeout 15s;

    proxy_cache_path nginx_sa/cache levels=2:2 keys_zone=volatile:125m max_size=471859200 inactive=480m use_temp_path=off;    

    server {
        error_log logs/server_error.log debug;
        server_name "lab.testing.com";
        listen  10.9.8.18:80;
        listen  10.9.8.18:443 ssl;       

        location / {
            #error_log logs/error.log notice;
            proxy_cache volatile; 
            proxy_pass http://upstream.testing.com/;
        }
    }   
}

Change History (4)

comment:1 by Maxim Dounin, 5 months ago

This looks like an issue in the kernel: sendfile() should return -1 / EAGAIN if it blocks without sending anything. Given that you are using kernel 4.18, it is expected to have issues in the kernel TLS implementation, at least 5.2 is recommended (see 65946a191197). Are you able to reproduce the issue with newer kernels?

comment:2 by noamc.qwilt.com@…, 5 months ago

Hi Maxin, thanks for the prompt reply.

I tried reproducing it on 5.10, and indeed I wasn't able to.
uname:

Linux it.testing.com 5.10.101 #2 SMP Thu Feb 24 12:01:49 IST 2022 x86_64 x86_64 x86_64 GNU/Linux

I'll keep an eye out in case it reproduces on newer kernels and report back. The workaround in this ticket is still good reference for anyone forced to use an older kernel. Thanks!

comment:3 by Maxim Dounin, 5 months ago

Resolution: invalid
Status: newclosed

Thanks for confirming, closing this.

Note that the workaround suggested might not be a good idea, as it might result in an infinite loop in case of a real file truncation. A better solution would be to avoid using kTLS with buggy kernels.

comment:4 by noamc.qwilt.com@…, 5 months ago

Oh, we did protect ourselves from that when used. We had a counter for max such retry attempts :)
Good points regardless, thank you.

Note: See TracTickets for help on using tickets.