Opened 2 years ago
Closed 23 months ago
#1930 closed defect (invalid)
Space in URL %20 is decoded space causing an invalid URL
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | documentation | Version: | 1.17.x |
Keywords: | Cc: | ||
uname -a: | Windows 10 64-bit | ||
nginx -V: |
nginx version: nginx/1.17.8
built by cl 16.00.40219.01 for 80x86 built with OpenSSL 1.1.1d 10 Sep 2019 TLS SNI support enabled configure arguments: --with-cc=cl --builddir=objs.msvc8 --with-debug --prefix= --conf-path=conf/nginx.conf --pid-path=logs/nginx.pid --http-log-path=logs/access.log --error-log-path=logs/error.log --sbin-path=nginx.exe --http-client-body-temp-path=temp/client_body_temp --http-proxy-temp-path=temp/proxy_temp --http-fastcgi-temp-path=temp/fastcgi_temp --http-scgi-temp-path=temp/scgi_temp --http-uwsgi-temp-path=temp/uwsgi_temp --with-cc-opt=-DFD_SETSIZE=1024 --with-pcre=objs.msvc8/lib/pcre-8.43 --with-zlib=objs.msvc8/lib/zlib-1.2.11 --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_sub_module --with-http_dav_module --with-http_stub_status_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_auth_request_module --with-http_random_index_module --with-http_secure_link_module --with-http_slice_module --with-mail --with-stream --with-openssl=objs.msvc8/lib/openssl-1.1.1d --with-openssl-opt='no-asm no-tests -D_WIN32_WINNT=0x0501' --with-http_ssl_module --with-mail_ssl_module --with-stream_ssl_module |
Description
We are seeing nginx 1.15 and the latest 1.17.8 wrongly decoding %20 to " " (space) in URLs before forwarding them upstream. Some upstreams do not accept this and cause a 400. Nginx should not IMHO be unencoding URLs!
Consider the following Nginx running on 8080 with fiddler running as the upstream server on port 8888 to view traffic:
location ~ "^/yo/(.*)" { proxy_pass http://127.0.0.1:8888/$1$is_args$args; }
Next we hit NginX with:
GET http://localhost:8080/yo/foo%20bar
Fiddler (upstream) shows the incoming request as:
GET /foo bar
Correct would be
GET /foo%20bar
The following workaround is available:
location ~ "^/yo/(.*)" { set $allowspace1 $1; proxy_pass http://127.0.0.1:8888/$allowspace1$is_args$args;
Change History (9)
comment:1 by , 2 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 2 years ago
you are using $1$is_args$args, where $1 comes from unescaped URI as matched by the location.
Do you mean by "unescaped" that the URL is decoded from /foo%20bar
to /foo bar
? If so then you are confirming the incorrect behaviour. Why would anyone want the URL unencoded??
I want to pass /foo%20bar
and not /foo bar
downstream. Why is $1
unencoded and which variable can I use to get the URL *as is*?
comment:3 by , 2 years ago
Note: in our scenario we are rewriting /foo%20bar
to /subdir/foo%20bar
downstream and not to root /
and thus we are using proxy_pass
with a URL and running into this documented but IMHO incorrect behaviour.
Documentation
If the proxy_pass directive is specified with a URI, then when a request is passed to the server, the part of a normalized request URI matching the location is replaced by a URI specified in the directive:
location /name/ { proxy_pass http://127.0.0.1/remote/; }
If proxy_pass is specified without a URI, the request URI is passed to the server in the same form as sent by a client when the original request is processed, or the full normalized request URI is passed when processing the changed URI:
location /some/path/ { proxy_pass http://127.0.0.1; }
comment:5 by , 2 years ago
Resolution: | → invalid |
---|---|
Status: | reopened → closed |
Do you mean by "unescaped" that the URL is decoded from /foo%20bar to /foo bar? If so then you are confirming the incorrect behaviour. Why would anyone want the URL unencoded??
Locations are matched on _unescaped_ URIs, so "location /foo" matches both "GET /foo" and "GET /%66oo". See the location directive description for details.
Why is $1 unencoded and which variable can I use to get the URL *as is*?
The $1 variable contains unescaped data as you've obtained it from the unescaped URI. If you want original escaped URI as got from the client, consider using $request_uri instead. On the other hand, as explained above, most likely you don't need it at all. There are much simpler and more natural solutions in nginx to do what you want to do.
Note: in our scenario we are rewriting /foo%20bar to /subdir/foo%20bar downstream and not to root / and thus we are using proxy_pass with a URL and running into this documented but IMHO incorrect behaviour.
You are not using "proxy_pass with a URL", you are instead using "proxy_pass with an URL explicitly specified using variables". This is a special mode designed to be used when you know in advance the URL you want to proxy the request to. As explained above, using this mode to _modify_ existing URI might be tricky. And it certainly does not make sense to use it to remove or add a subdirectory.
If you want to add a subdirectory, consider using proxy_pass with an URI component, _without_ variables, as suggested by the documentation:
location / { proxy_pass http://127.0.0.1/subdir/; }
This way requests to /foo%20bar
will be passed as requests to /subdir/foo%20bar
on the proxied server.
comment:6 by , 23 months ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
I think I see the complication which I forgot to explain:
I am trying to proxy /yo/lo/foo%20bar
to any upstream server and path e.g. http://127.0.0.1:8888/here/foo%20bar
where lo
and foo%bar
are variable.
Note lo
is a variable which I want to IGNORE. It could be any value but foo%bar
is a variable I want to pass on INCLUDING SPACES so it could be:
/yo/lo/foobar
=>/here/foobar
/yo/baz/foobar
=>/here/foobar
/yo/baz/foo-Bar
=>/here/foo-Bar
/yo/nginxiscool/foo bar
=>/here/foo%20bar
* This breaks if I use$1
So the dilemma is: I have to use RegEx to ignore lo
but if I use RegEx $1
causes broken URLs.
Again I ask why is $1 decoded to foo bar
and how can we get it unencoded foo%20bar
?
comment:7 by , 23 months ago
Resolution: | → invalid |
---|---|
Status: | reopened → closed |
Again I ask why is $1 decoded to foo bar and how can we get it unencoded foo%20bar?
Again: locations are matched on unescaped URIs. Since $1
in your case comes from location matching, you get unescaped foo bar
in the $1
variable. And you use it in the proxy_pass
directive, which needs properly escaped URL.
As already said in comment:5, the original escaped URI is available in the $request_uri
variable, and you can obtain a part of it instead of using location matching. This might not be trivial to do properly though, as you still need to unescape things to properly remove the /yo/.../
part.
As already mentioned in comment:1, for cases when you need to apply complex modifications to URIs, most simple solution would be to use rewrite, which is specifically designed to modify URIs. In your particular case something like this should work:
location /yo/ { rewrite ^/yo/[^/]+/(.*) /here/$1 break; rewrite ^ /here/ break; proxy_pass http://127.0.0.1:8888; }
If you have further questions on how to configure nginx, consider using support options available.
comment:8 by , 23 months ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
I have asked where anyone needs an unescaped $1
and you have explained where it comes from technically (because location directives work on unescaped URIs) not why it is unescaped. Who needs an unescaped variable which they cannot use in URLs?
Just because a user wants to work with unescaped URIs in the configuration does not mean that they want invalid values in a variable. I say "invalid" because an unescaped URL is an invalid URL - hardly anyone needs $1
unescaped and most people need it escaped since they are using it in a URL. Every Nginx example on the official Docs site and elsewhere uses $1
in a URL.
If you think it is right that $1
is unescaped then what can we use for an escaped version?
$_1
?
Your solution provided rewrite ^/yo/[^/]+/(.*) /here/$1 break;
does not work with spaces in URLs. Indeed $1
is always invalid in a URL if the URL contains any %__
escaping.
comment:9 by , 23 months ago
Resolution: | → invalid |
---|---|
Status: | reopened → closed |
Who needs an unescaped variable which they cannot use in URLs?
This is more or less philosophical question. In an ideal world where no one needs an unescaped variable there is no need for escaping, as everyone uses escaped variables instead.
In the real world escaping is needed because one needs to use characters not allowed in URLs, for example, in file names. And nginx has to work with file names. Further, variables are often used for various access control tasks - and this needs unescaped form, much like for location matching.
If you think that the real world web server might be written without using the unescaped variables, consider writing one. It would be interesting to take a look you'll solve the problems which appear when you have to work with characters which need escaping.
If you think it is right that
$1
is unescaped then what can we use for an escaped version?
As of now, there is no easy way to obtain an escaped version of a particular variable (you can either use 3rd party modules such as "set-misc" or scripting modules such as embedded perl or njs; there is a feature request to simply this, #52). The recommended approach is check the context: that is, use escaped ones when you need escaped strings, and unescaped ones when you need unescaped strings.
Your solution provided
rewrite ^/yo/[^/]+/(.*) /here/$1 break;
does not work with spaces in URLs. Indeed$1
is always invalid in a URL if the URL contains any%__
escaping.
It looks you've failed to properly copy the example configuration provided. The rewrite
directive knows that $1
is from the regular expression in the rewrite itself, and takes care of proper escaping. As previously suggested, if you have further questions on how to configure nginx, consider using support options available.
In your configuration snippet:
you are using
$1$is_args$args
, where$1
comes from unescaped URI as matched by the location. Given thatproxy_pass
expects properly escaped URI and uses it as is, this results in arbitrary invalid requests being generated.If you are not prepared to deal with it and do not want to correctly escape all the variables used, consider using
proxy_pass
without variables instead. The same configuration can be correctly rewritten as:Note the trailing
/
inproxy_pass
. Alternatively, you can useproxy_pass
without a URI andrewrite
, though in this particular case usingproxy_pass
with a URI is much cleaner solution. See proxy_pass docs for additional examples.(Note well that the "workaround" you've mentioned is rather a bug, see #348.)