Opened 2 years ago

Closed 2 years ago

#2320 closed defect (duplicate)

Inconsistent URL decoding with X-Accel-Redirect depending on whether the original request URL was url-encoded

Reported by: ThiefMaster Owned by:
Priority: major Milestone:
Component: nginx-core Version: 1.19.x
Keywords: Cc: ThiefMaster
uname -a: Linux eluvian 5.15.19-gentoo #1 SMP Sun Feb 6 19:44:25 CET 2022 x86_64 AMD Ryzen 5 3600X 6-Core Processor AuthenticAMD GNU/Linux
nginx -V: nginx version: nginx/1.20.1
built with OpenSSL 1.1.1l 24 Aug 2021
TLS SNI support enabled
configure arguments: --prefix=/usr --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error_log --pid-path=/run/nginx.pid --lock-path=/run/lock/nginx.lock --with-cc-opt=-I/usr/include --with-ld-opt=-L/usr/lib64 --http-log-path=/var/log/nginx/access_log --http-client-body-temp-path=/var/lib/nginx/tmp/client --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --http-scgi-temp-path=/var/lib/nginx/tmp/scgi --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --with-compat --with-debug --with-http_v2_module --with-pcre --without-http_browser_module --without-http_empty_gif_module --without-http_geo_module --without-http_grpc_module --without-http_limit_req_module --without-http_limit_conn_module --without-http_memcached_module --without-http_mirror_module --without-http_scgi_module --without-http_ssi_module --without-http_split_clients_module --without-http_upstream_hash_module --without-http_upstream_ip_hash_module --without-http_upstream_keepalive_module --without-http_upstream_least_conn_module --without-http_upstream_zone_module --without-http_userid_module --with-http_addition_module --with-http_stub_status_module --with-http_sub_module --with-http_realip_module --add-module=external_module/ngx-fancyindex-0.4.4 --with-http_ssl_module --without-stream_access_module --without-stream_geo_module --without-stream_limit_conn_module --without-stream_map_module --without-stream_return_module --without-stream_split_clients_module --without-stream_upstream_hash_module --without-stream_upstream_least_conn_module --without-stream_upstream_zone_module --without-mail_imap_module --without-mail_pop3_module --without-mail_smtp_module --user=nginx --group=nginx

Description

I'm using nginx and x-accel-redirect in what's probably a slightly creative way to proxy file downloads from S3 in order to not show the very ugly and temporary S3 URL to my users. This works fine, but I noticed that in a few very rare cases the requests fail (usually due to an invalid S3 signature).

In any case, I managed to reproduce it without S3 and it seem to happen because depending on whether the URL the user accesses contains url-encoded charaters or not, the URL encoding behavior in the internal request changes.

This is a problem of course, because it means I cannot reliable know whether I need to URL-encode the path included in my x-accel-redirect header or not unless I use a horrible workaround of checking whether the current request url contains a % character.

I believe this behavior is so strange that it can only be a bug, but if there are any workarounds to get consistent behavior (so I can either always url-encode the path in my x-accel-redirect header or never do so) please let me know.

I'm attaching a minimal test case to reproduce this. The nginx version used there was the one that's stable on my Gentoo system, but I'm also including a Dockerfile which can be used to test against the nginx:latest image on Docker hub in this Git repo: https://github.com/ThiefMaster/nginx-bug-repro

If you want to run the test case locally:
Put the two files in some folder, create a Python virtualenv, activate it and install flask in there (pip install flask).

Then run the flask app with flask run -p 12345 -h 0.0.0.0 and nginx with nginx -c $(realpath nginx.conf) -e /dev/stdout.

nginx.conf:

daemon off;
worker_processes 2;

pid /tmp/nginx/nginx.pid;
error_log /dev/stdout info;


events {
        worker_connections 1024;
        use epoll;
}

http {
	client_body_temp_path /tmp/nginx/client;
	proxy_temp_path /tmp/nginx/proxy;
	fastcgi_temp_path /tmp/nginx/fcgi;
	uwsgi_temp_path /tmp/nginx/uwsgi;

        log_format short '$remote_addr [$time_local] "$request" $status $bytes_sent';

	server {
                listen 8000 default_server;
                listen [::]:8000 default_server;
                access_log /dev/stdout short;

		location ~ ^/_internal/(https?)/([^/]+)/(.+)$ {
			internal;
			set $download_protocol $1;
			set $download_host $2;
			set $download_path $3;
			set $download_url $download_protocol://$download_host/$download_path;
			resolver 127.0.0.1;
			proxy_pass $download_url$is_args$args;
		}

		location / {
			proxy_pass http://127.0.0.1:12345;
		}
		root /var/empty;
        }
}

app.py:

from urllib.parse import quote
from flask import Flask, url_for, request


app = Flask(__name__)


@app.route('/')
def index():
    return f'''
        WITHOUT explicitly encoding x-accel-redirect url:<br>
        <a href="{url_for('test_without')}">Test [main url without encoding, target without encoding]</a><br>
        <a href="{url_for('test_without', enc_target=1)}">Test [main url without encoding, target with encoding]</a> 💣<br>
        <br>
        <a href="{url_for('test_with')}">Test [main url with encoding, target without encoding]</a><br>
        <a href="{url_for('test_with', enc_target=1)}">Test [main url with encoding, target with encoding]</a><br>
        <br><br>
        WITH explicitly encoding x-accel-redirect url:<br>
        <a href="{url_for('test_without', enc_xar=1)}">Test [main url without encoding, target without encoding]</a><br>
        <a href="{url_for('test_without', enc_xar=1, enc_target=1)}">Test [main url without encoding, target with encoding]</a><br>
        <br>
        <a href="{url_for('test_with', enc_xar=1)}">Test [main url with encoding, target without encoding]</a><br>
        <a href="{url_for('test_with', enc_xar=1, enc_target=1)}">Test [main url with encoding, target with encoding]</a> ⚠
    '''


def _test():
    endpoint = 'target_with' if request.args.get('enc_target') == '1' else 'target_without'
    path = url_for(endpoint)
    internal_url = f'/_internal/http/{request.host}{path}'
    print(f'Redirecting internally to {internal_url}')
    response = app.make_response('')
    if request.args.get('enc_xar') == '1':
        internal_url = quote(internal_url)
    response.headers['X-Accel-Redirect'] = internal_url
    return response


@app.route('/test-without/')
def test_without():
    return _test()


@app.route('/test with/')
def test_with():
    return _test()


def _target():
    return f"target called url={request.base_url}"


@app.route('/target-without')
def target_without():
    return _target()


@app.route('/target with')
def target_with():
    return _target()


@app.errorhandler(404)
def notfound(exc):
    return f'NOT FOUND: {request.base_url}'

Change History (1)

comment:1 by Maxim Dounin, 2 years ago

Resolution: duplicate
Status: newclosed

This looks like a duplicate of #348: excessive URI encoding is done by set ... $1; if there were escaped characters in the original request URI. As a workaround, consider using named captures to avoid this extra escaping.

Note: See TracTickets for help on using tickets.