#1930 closed defect (invalid)
Space in URL %20 is decoded space causing an invalid URL
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | documentation | Version: | 1.17.x |
Keywords: | Cc: | ||
uname -a: | Windows 10 64-bit | ||
nginx -V: |
nginx version: nginx/1.17.8
built by cl 16.00.40219.01 for 80x86 built with OpenSSL 1.1.1d 10 Sep 2019 TLS SNI support enabled configure arguments: --with-cc=cl --builddir=objs.msvc8 --with-debug --prefix= --conf-path=conf/nginx.conf --pid-path=logs/nginx.pid --http-log-path=logs/access.log --error-log-path=logs/error.log --sbin-path=nginx.exe --http-client-body-temp-path=temp/client_body_temp --http-proxy-temp-path=temp/proxy_temp --http-fastcgi-temp-path=temp/fastcgi_temp --http-scgi-temp-path=temp/scgi_temp --http-uwsgi-temp-path=temp/uwsgi_temp --with-cc-opt=-DFD_SETSIZE=1024 --with-pcre=objs.msvc8/lib/pcre-8.43 --with-zlib=objs.msvc8/lib/zlib-1.2.11 --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_stub_status_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_auth_request_module --with-http_random_index_module --with-http_secure_link_module --with-http_slice_module --with-mail --with-stream --with-openssl=objs.msvc8/lib/openssl-1.1.1d --with-openssl-opt='no-asm no-tests -D_WIN32_WINNT=0x0501' --with-http_ssl_module --with-mail_ssl_module --with-stream_ssl_module |
Description
We are seeing nginx 1.15 and the latest 1.17.8 wrongly decoding %20 to " " (space) in URLs before forwarding them upstream. Some upstreams do not accept this and cause a 400. Nginx should not IMHO be unencoding URLs!
Consider the following Nginx running on 8080 with fiddler running as the upstream server on port 8888 to view traffic:
location ~ "^/yo/(.*)" { proxy_pass http://127.0.0.1:8888/$1$is_args$args; }
Next we hit NginX with:
GET http://localhost:8080/yo/foo%20bar
Fiddler (upstream) shows the incoming request as:
GET /foo bar
Correct would be
GET /foo%20bar
The following workaround is available:
location ~ "^/yo/(.*)" { set $allowspace1 $1; proxy_pass http://127.0.0.1:8888/$allowspace1$is_args$args;
Change History (11)
comment:1 by , 5 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 5 years ago
you are using $1$is_args$args, where $1 comes from unescaped URI as matched by the location.
Do you mean by "unescaped" that the URL is decoded from /foo%20bar
to /foo bar
? If so then you are confirming the incorrect behaviour. Why would anyone want the URL unencoded??
I want to pass /foo%20bar
and not /foo bar
downstream. Why is $1
unencoded and which variable can I use to get the URL *as is*?
comment:3 by , 5 years ago
Note: in our scenario we are rewriting /foo%20bar
to /subdir/foo%20bar
downstream and not to root /
and thus we are using proxy_pass
with a URL and running into this documented but IMHO incorrect behaviour.
Documentation
If the proxy_pass directive is specified with a URI, then when a request is passed to the server, the part of a normalized request URI matching the location is replaced by a URI specified in the directive:
location /name/ { proxy_pass http://127.0.0.1/remote/; }
If proxy_pass is specified without a URI, the request URI is passed to the server in the same form as sent by a client when the original request is processed, or the full normalized request URI is passed when processing the changed URI:
location /some/path/ { proxy_pass http://127.0.0.1; }
comment:5 by , 5 years ago
Resolution: | → invalid |
---|---|
Status: | reopened → closed |
Do you mean by "unescaped" that the URL is decoded from /foo%20bar to /foo bar? If so then you are confirming the incorrect behaviour. Why would anyone want the URL unencoded??
Locations are matched on _unescaped_ URIs, so "location /foo" matches both "GET /foo" and "GET /%66oo". See the location directive description for details.
Why is $1 unencoded and which variable can I use to get the URL *as is*?
The $1 variable contains unescaped data as you've obtained it from the unescaped URI. If you want original escaped URI as got from the client, consider using $request_uri instead. On the other hand, as explained above, most likely you don't need it at all. There are much simpler and more natural solutions in nginx to do what you want to do.
Note: in our scenario we are rewriting /foo%20bar to /subdir/foo%20bar downstream and not to root / and thus we are using proxy_pass with a URL and running into this documented but IMHO incorrect behaviour.
You are not using "proxy_pass with a URL", you are instead using "proxy_pass with an URL explicitly specified using variables". This is a special mode designed to be used when you know in advance the URL you want to proxy the request to. As explained above, using this mode to _modify_ existing URI might be tricky. And it certainly does not make sense to use it to remove or add a subdirectory.
If you want to add a subdirectory, consider using proxy_pass with an URI component, _without_ variables, as suggested by the documentation:
location / { proxy_pass http://127.0.0.1/subdir/; }
This way requests to /foo%20bar
will be passed as requests to /subdir/foo%20bar
on the proxied server.
comment:6 by , 5 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
I think I see the complication which I forgot to explain:
I am trying to proxy /yo/lo/foo%20bar
to any upstream server and path e.g. http://127.0.0.1:8888/here/foo%20bar
where lo
and foo%bar
are variable.
Note lo
is a variable which I want to IGNORE. It could be any value but foo%bar
is a variable I want to pass on INCLUDING SPACES so it could be:
/yo/lo/foobar
=>/here/foobar
/yo/baz/foobar
=>/here/foobar
/yo/baz/foo-Bar
=>/here/foo-Bar
/yo/nginxiscool/foo bar
=>/here/foo%20bar
* This breaks if I use$1
So the dilemma is: I have to use RegEx to ignore lo
but if I use RegEx $1
causes broken URLs.
Again I ask why is $1 decoded to foo bar
and how can we get it unencoded foo%20bar
?
comment:7 by , 5 years ago
Resolution: | → invalid |
---|---|
Status: | reopened → closed |
Again I ask why is $1 decoded to foo bar and how can we get it unencoded foo%20bar?
Again: locations are matched on unescaped URIs. Since $1
in your case comes from location matching, you get unescaped foo bar
in the $1
variable. And you use it in the proxy_pass
directive, which needs properly escaped URL.
As already said in comment:5, the original escaped URI is available in the $request_uri
variable, and you can obtain a part of it instead of using location matching. This might not be trivial to do properly though, as you still need to unescape things to properly remove the /yo/.../
part.
As already mentioned in comment:1, for cases when you need to apply complex modifications to URIs, most simple solution would be to use rewrite, which is specifically designed to modify URIs. In your particular case something like this should work:
location /yo/ { rewrite ^/yo/[^/]+/(.*) /here/$1 break; rewrite ^ /here/ break; proxy_pass http://127.0.0.1:8888; }
If you have further questions on how to configure nginx, consider using support options available.
comment:8 by , 5 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
I have asked where anyone needs an unescaped $1
and you have explained where it comes from technically (because location directives work on unescaped URIs) not why it is unescaped. Who needs an unescaped variable which they cannot use in URLs?
Just because a user wants to work with unescaped URIs in the configuration does not mean that they want invalid values in a variable. I say "invalid" because an unescaped URL is an invalid URL - hardly anyone needs $1
unescaped and most people need it escaped since they are using it in a URL. Every Nginx example on the official Docs site and elsewhere uses $1
in a URL.
If you think it is right that $1
is unescaped then what can we use for an escaped version?
$_1
?
Your solution provided rewrite ^/yo/[^/]+/(.*) /here/$1 break;
does not work with spaces in URLs. Indeed $1
is always invalid in a URL if the URL contains any %__
escaping.
comment:9 by , 5 years ago
Resolution: | → invalid |
---|---|
Status: | reopened → closed |
Who needs an unescaped variable which they cannot use in URLs?
This is more or less philosophical question. In an ideal world where no one needs an unescaped variable there is no need for escaping, as everyone uses escaped variables instead.
In the real world escaping is needed because one needs to use characters not allowed in URLs, for example, in file names. And nginx has to work with file names. Further, variables are often used for various access control tasks - and this needs unescaped form, much like for location matching.
If you think that the real world web server might be written without using the unescaped variables, consider writing one. It would be interesting to take a look you'll solve the problems which appear when you have to work with characters which need escaping.
If you think it is right that
$1
is unescaped then what can we use for an escaped version?
As of now, there is no easy way to obtain an escaped version of a particular variable (you can either use 3rd party modules such as "set-misc" or scripting modules such as embedded perl or njs; there is a feature request to simply this, #52). The recommended approach is check the context: that is, use escaped ones when you need escaped strings, and unescaped ones when you need unescaped strings.
Your solution provided
rewrite ^/yo/[^/]+/(.*) /here/$1 break;
does not work with spaces in URLs. Indeed$1
is always invalid in a URL if the URL contains any%__
escaping.
It looks you've failed to properly copy the example configuration provided. The rewrite
directive knows that $1
is from the regular expression in the rewrite itself, and takes care of proper escaping. As previously suggested, if you have further questions on how to configure nginx, consider using support options available.
follow-up: 11 comment:10 by , 2 years ago
I think there is a disconnect here. "cawoodm" is talking about "why is my variable unescaped" when it comes from a regex location match. Well, it should be unescaped, because you wouldn't want to have your regex include escaped characters. You'd run into constant issues with that!
So it is correct that $1
is unsecaped. Good.
However, the part that is missing here and which wasn't discussed (unless I missed it) is this: why does proxy_pass not ensure that the generated request-line
is properly formatted?
Given the example in this ticket, the request-line
sent to the proxy would look like this:
GET /foo bar HTTP/1.1
This is in direct violation of RFC 2616, which states the following:
The Request-Line begins with a method token, followed by the Request-URI and the protocol version, and ending with CRLF. The elements are separated by SP characters. No CR or LF is allowed except in the final CRLF sequence.
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
The delimiter used in the request-line
is a space, so "in what world" is it correct that proxy_pass
doesn't at least encode spaces in the URL
parameter to ensure that the request-line
is HTTP-compliant?
comment:11 by , 2 years ago
Replying to JoshMcCullough@…:
However, the part that is missing here and which wasn't discussed (unless I missed it) is this: why does proxy_pass not ensure that the generated
request-line
is properly formatted?
The proxy_pass
expects properly escaped URI. As per the "garbage in, garbage out" concept, an incorrect URI could result in anything. So, it's certainly not a bug that an incorrect URI results in a request-line
which is not RFC-compliant.
The real question is: wouldn't it be better from usability point of view to additionally scan resulting URI to make sure it doesn't contain some obvious protocol violations, such as spaces. In most cases nginx does not try to do something like this, especially at runtime, and rather does what the configuration says. See, for example, ticket #2258.
[...]
The delimiter used in the
request-line
is a space, so "in what world" is it correct thatproxy_pass
doesn't at least encode spaces in theURL
parameter to ensure that therequest-line
is HTTP-compliant?
I would rather say that "at least encode spaces" is the worst thing that nginx can do here, since configuration mistakes as the one in the ticket will be unnoticed, resulting in incorrect handling of some other requests, such as ones with %
.
In your configuration snippet:
you are using
$1$is_args$args
, where$1
comes from unescaped URI as matched by the location. Given thatproxy_pass
expects properly escaped URI and uses it as is, this results in arbitrary invalid requests being generated.If you are not prepared to deal with it and do not want to correctly escape all the variables used, consider using
proxy_pass
without variables instead. The same configuration can be correctly rewritten as:Note the trailing
/
inproxy_pass
. Alternatively, you can useproxy_pass
without a URI andrewrite
, though in this particular case usingproxy_pass
with a URI is much cleaner solution. See proxy_pass docs for additional examples.(Note well that the "workaround" you've mentioned is rather a bug, see #348.)