Opened 5 years ago

Closed 4 years ago

Last modified 3 months ago

#1930 closed defect (invalid)

Space in URL %20 is decoded space causing an invalid URL

Reported by: cawoodm@… Owned by:
Priority: minor Milestone:
Component: documentation Version: 1.17.x
Keywords: Cc:
uname -a: Windows 10 64-bit
nginx -V: nginx version: nginx/1.17.8
built by cl 16.00.40219.01 for 80x86
built with OpenSSL 1.1.1d 10 Sep 2019
TLS SNI support enabled
configure arguments: --with-cc=cl --builddir=objs.msvc8 --with-debug --prefix= --conf-path=conf/nginx.conf --pid-path=logs/nginx.pid --http-log-path=logs/access.log --error-log-path=logs/error.log --sbin-path=nginx.exe --http-client-body-temp-path=temp/client_body_temp --http-proxy-temp-path=temp/proxy_temp --http-fastcgi-temp-path=temp/fastcgi_temp --http-scgi-temp-path=temp/scgi_temp --http-uwsgi-temp-path=temp/uwsgi_temp --with-cc-opt=-DFD_SETSIZE=1024 --with-pcre=objs.msvc8/lib/pcre-8.43 --with-zlib=objs.msvc8/lib/zlib-1.2.11 --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_stub_status_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_auth_request_module --with-http_random_index_module --with-http_secure_link_module --with-http_slice_module --with-mail --with-stream --with-openssl=objs.msvc8/lib/openssl-1.1.1d --with-openssl-opt='no-asm no-tests -D_WIN32_WINNT=0x0501' --with-http_ssl_module --with-mail_ssl_module --with-stream_ssl_module

Description

We are seeing nginx 1.15 and the latest 1.17.8 wrongly decoding %20 to " " (space) in URLs before forwarding them upstream. Some upstreams do not accept this and cause a 400. Nginx should not IMHO be unencoding URLs!

Consider the following Nginx running on 8080 with fiddler running as the upstream server on port 8888 to view traffic:

location ~ "^/yo/(.*)" {
  proxy_pass http://127.0.0.1:8888/$1$is_args$args;
}

Next we hit NginX with:

GET http://localhost:8080/yo/foo%20bar

Fiddler (upstream) shows the incoming request as:

GET /foo bar

Correct would be

GET /foo%20bar

The following workaround is available:

location ~ "^/yo/(.*)" {
  set $allowspace1 $1;
  proxy_pass http://127.0.0.1:8888/$allowspace1$is_args$args;

Change History (11)

comment:1 by Maxim Dounin, 5 years ago

Resolution: invalid
Status: newclosed

In your configuration snippet:

location ~ "^/yo/(.*)" {
  proxy_pass http://127.0.0.1:8888/$1$is_args$args;
}

you are using $1$is_args$args, where $1 comes from unescaped URI as matched by the location. Given that proxy_pass expects properly escaped URI and uses it as is, this results in arbitrary invalid requests being generated.

If you are not prepared to deal with it and do not want to correctly escape all the variables used, consider using proxy_pass without variables instead. The same configuration can be correctly rewritten as:

location /yo/ {
    proxy_pass http://127.0.0.1:8888/;
}

Note the trailing / in proxy_pass. Alternatively, you can use proxy_pass without a URI and rewrite, though in this particular case using proxy_pass with a URI is much cleaner solution. See proxy_pass docs for additional examples.

(Note well that the "workaround" you've mentioned is rather a bug, see #348.)

comment:2 by marc.cawood.buhlergroup.com@…, 5 years ago

you are using $1$is_args$args, where $1 comes from unescaped URI as matched by the location.

Do you mean by "unescaped" that the URL is decoded from /foo%20bar to /foo bar? If so then you are confirming the incorrect behaviour. Why would anyone want the URL unencoded??

I want to pass /foo%20bar and not /foo bar downstream. Why is $1 unencoded and which variable can I use to get the URL *as is*?

comment:3 by marc.cawood.buhlergroup.com@…, 5 years ago

Note: in our scenario we are rewriting /foo%20bar to /subdir/foo%20bar downstream and not to root / and thus we are using proxy_pass with a URL and running into this documented but IMHO incorrect behaviour.

Documentation

If the proxy_pass directive is specified with a URI, then when a request is passed to the server, the part of a normalized request URI matching the location is replaced by a URI specified in the directive:

location /name/ {
    proxy_pass http://127.0.0.1/remote/;
}

If proxy_pass is specified without a URI, the request URI is passed to the server in the same form as sent by a client when the original request is processed, or the full normalized request URI is passed when processing the changed URI:

location /some/path/ {
    proxy_pass http://127.0.0.1;
}

comment:4 by marc.cawood.buhlergroup.com@…, 5 years ago

Resolution: invalid
Status: closedreopened

See above.

comment:5 by Maxim Dounin, 5 years ago

Resolution: invalid
Status: reopenedclosed

Do you mean by "unescaped" that the URL is decoded from /foo%20bar to /foo bar? If so then you are confirming the incorrect behaviour. Why would anyone want the URL unencoded??

Locations are matched on _unescaped_ URIs, so "location /foo" matches both "GET /foo" and "GET /%66oo". See the location directive description for details.

Why is $1 unencoded and which variable can I use to get the URL *as is*?

The $1 variable contains unescaped data as you've obtained it from the unescaped URI. If you want original escaped URI as got from the client, consider using $request_uri instead. On the other hand, as explained above, most likely you don't need it at all. There are much simpler and more natural solutions in nginx to do what you want to do.

Note: in our scenario we are rewriting /foo%20bar to /subdir/foo%20bar downstream and not to root / and thus we are using proxy_pass with a URL and running into this documented but IMHO incorrect behaviour.

You are not using "proxy_pass with a URL", you are instead using "proxy_pass with an URL explicitly specified using variables". This is a special mode designed to be used when you know in advance the URL you want to proxy the request to. As explained above, using this mode to _modify_ existing URI might be tricky. And it certainly does not make sense to use it to remove or add a subdirectory.

If you want to add a subdirectory, consider using proxy_pass with an URI component, _without_ variables, as suggested by the documentation:

location / {
    proxy_pass http://127.0.0.1/subdir/;
}

This way requests to /foo%20bar will be passed as requests to /subdir/foo%20bar on the proxied server.

comment:6 by cawoodm@…, 4 years ago

Resolution: invalid
Status: closedreopened

I think I see the complication which I forgot to explain:

I am trying to proxy /yo/lo/foo%20bar to any upstream server and path e.g. http://127.0.0.1:8888/here/foo%20bar where lo and foo%bar are variable.

Note lo is a variable which I want to IGNORE. It could be any value but foo%bar is a variable I want to pass on INCLUDING SPACES so it could be:

  • /yo/lo/foobar => /here/foobar
  • /yo/baz/foobar => /here/foobar
  • /yo/baz/foo-Bar => /here/foo-Bar
  • /yo/nginxiscool/foo bar => /here/foo%20bar * This breaks if I use $1

So the dilemma is: I have to use RegEx to ignore lo but if I use RegEx $1 causes broken URLs.

Again I ask why is $1 decoded to foo bar and how can we get it unencoded foo%20bar?

comment:7 by Maxim Dounin, 4 years ago

Resolution: invalid
Status: reopenedclosed

Again I ask why is $1 decoded to foo bar and how can we get it unencoded foo%20bar?

Again: locations are matched on unescaped URIs. Since $1 in your case comes from location matching, you get unescaped foo bar in the $1 variable. And you use it in the proxy_pass directive, which needs properly escaped URL.

As already said in comment:5, the original escaped URI is available in the $request_uri variable, and you can obtain a part of it instead of using location matching. This might not be trivial to do properly though, as you still need to unescape things to properly remove the /yo/.../ part.

As already mentioned in comment:1, for cases when you need to apply complex modifications to URIs, most simple solution would be to use rewrite, which is specifically designed to modify URIs. In your particular case something like this should work:

location /yo/ {
    rewrite ^/yo/[^/]+/(.*) /here/$1 break;
    rewrite ^ /here/ break;
    proxy_pass http://127.0.0.1:8888;
}

If you have further questions on how to configure nginx, consider using support options available.

comment:8 by cawoodm@…, 4 years ago

Resolution: invalid
Status: closedreopened

I have asked where anyone needs an unescaped $1 and you have explained where it comes from technically (because location directives work on unescaped URIs) not why it is unescaped. Who needs an unescaped variable which they cannot use in URLs?

Just because a user wants to work with unescaped URIs in the configuration does not mean that they want invalid values in a variable. I say "invalid" because an unescaped URL is an invalid URL - hardly anyone needs $1 unescaped and most people need it escaped since they are using it in a URL. Every Nginx example on the official Docs site and elsewhere uses $1 in a URL.

If you think it is right that $1 is unescaped then what can we use for an escaped version?

$_1?

Your solution provided rewrite ^/yo/[^/]+/(.*) /here/$1 break; does not work with spaces in URLs. Indeed $1 is always invalid in a URL if the URL contains any %__ escaping.

comment:9 by Maxim Dounin, 4 years ago

Resolution: invalid
Status: reopenedclosed

Who needs an unescaped variable which they cannot use in URLs?

This is more or less philosophical question. In an ideal world where no one needs an unescaped variable there is no need for escaping, as everyone uses escaped variables instead.

In the real world escaping is needed because one needs to use characters not allowed in URLs, for example, in file names. And nginx has to work with file names. Further, variables are often used for various access control tasks - and this needs unescaped form, much like for location matching.

If you think that the real world web server might be written without using the unescaped variables, consider writing one. It would be interesting to take a look you'll solve the problems which appear when you have to work with characters which need escaping.

If you think it is right that $1 is unescaped then what can we use for an escaped version?

As of now, there is no easy way to obtain an escaped version of a particular variable (you can either use 3rd party modules such as "set-misc" or scripting modules such as embedded perl or njs; there is a feature request to simply this, #52). The recommended approach is check the context: that is, use escaped ones when you need escaped strings, and unescaped ones when you need unescaped strings.

Your solution provided rewrite ^/yo/[^/]+/(.*) /here/$1 break; does not work with spaces in URLs. Indeed $1 is always invalid in a URL if the URL contains any %__ escaping.

It looks you've failed to properly copy the example configuration provided. The rewrite directive knows that $1 is from the regular expression in the rewrite itself, and takes care of proper escaping. As previously suggested, if you have further questions on how to configure nginx, consider using support options available.

comment:10 by JoshMcCullough@…, 2 years ago

I think there is a disconnect here. Maxim is talking about "why is my variable unescaped" when it comes from a regex location match. Well, it should be unescaped, because you wouldn't want to have your regex include escaped characters. You'd run into constant issues with that!

So it is correct that $1 is unsecaped. Good.

However, the part that is missing here and which wasn't discussed (unless I missed it) is this: why does proxy_pass not ensure that the generated request-line is properly formatted?

Given the example in this ticket, the request-line sent to the proxy would look like this:

GET /foo bar HTTP/1.1

This is in direct violation of RFC 2616, which states the following:

The Request-Line begins with a method token, followed by the Request-URI and the protocol version, and ending with CRLF. The elements are separated by SP characters. No CR or LF is allowed except in the final CRLF sequence.

Request-Line = Method SP Request-URI SP HTTP-Version CRLF

The delimiter used in the request-line is a space, so "in what world" is it correct that proxy_pass doesn't at least encode spaces in the URL parameter to ensure that the request-line is HTTP-compliant?

Version 0, edited 2 years ago by JoshMcCullough@… (next)

in reply to:  10 comment:11 by Maxim Dounin, 2 years ago

Replying to JoshMcCullough@…:

However, the part that is missing here and which wasn't discussed (unless I missed it) is this: why does proxy_pass not ensure that the generated request-line is properly formatted?

The proxy_pass expects properly escaped URI. As per the "garbage in, garbage out" concept, an incorrect URI could result in anything. So, it's certainly not a bug that an incorrect URI results in a request-line which is not RFC-compliant.

The real question is: wouldn't it be better from usability point of view to additionally scan resulting URI to make sure it doesn't contain some obvious protocol violations, such as spaces. In most cases nginx does not try to do something like this, especially at runtime, and rather does what the configuration says. See, for example, ticket #2258.

[...]

The delimiter used in the request-line is a space, so "in what world" is it correct that proxy_pass doesn't at least encode spaces in the URL parameter to ensure that the request-line is HTTP-compliant?

I would rather say that "at least encode spaces" is the worst thing that nginx can do here, since configuration mistakes as the one in the ticket will be unnoticed, resulting in incorrect handling of some other requests, such as ones with %.

Note: See TracTickets for help on using tickets.