Opened 2 years ago
Last modified 2 years ago
#2391 new enhancement
bad parsing of Content-Type (sub_filter_types)
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | minor | Milestone: | |
Component: | nginx-module | Version: | 1.18.x |
Keywords: | mime sub_filter_types Content-Type | Cc: | |
uname -a: | Linux AUXIDMZ-21001 5.10.0-17-amd64 #1 SMP Debian 5.10.136-1 (2022-08-13) x86_64 GNU/Linux | ||
nginx -V: |
nginx version: nginx/1.18.0
built with OpenSSL 1.1.1n 15 Mar 2022 TLS SNI support enabled configure arguments: --with-cc-opt='-g -O2 -ffile-prefix-map=/build/nginx-QeqwpL/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-compat --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_sub_module |
Description
Hello,
I think wefound a bug in the way the Content-Type is parsed in order to match sub_filter_types.
The problem was an href line wasn't rewritten.
After analysibng a lot of the html flux,
We find that the Content -Type line is as follow :
Content-Type: text/html; charset:UTF-8; charset=UTF-8
It seems that nginx parses such lines as (.*); *charset=(.*)
because we had to put "text/html; charset:UTF-8" as the type in sub_filter_types to make it work (eg:
sub_filter_types "text/html; charset:UTF-8" script/js text/html text/css text/xml ;
It's not an urgent issue, as we found a solution (an ugly one);
but maybe you woulld be interested to know about the issue, and improve it.
Thanks
Parsing of content types is indeed very simple, and it only knows about one possible parameter,
charset
, and assumes it the last parameter and it is specified correctly, see ngx_http_upstream_copy_content_type(). Other parameters, if any, currently expected to be used explicitly in the types filtering, such as incharset_types
,ssi_types
,sub_filter_types
,xslt_types
,addition_types
, andgzip_types
- this is basically what you've done in your workaround configuration.While it might make sense to improve parsing (notably, to consistently support cases where
charset
isn't the last parameter) and/or types matching (see ticket #1119), note that in the particular example as provided in this ticket proper parsing would result in an error, sincecharset:UTF-8
is not a valid content type parameter (see RFC 9110).That is, in this particular case it might be a good idea to fix the backend and make sure it generates valid Content-Type instead of the invalid one it currently returns. This will also ensure that no nginx-side workarounds will be needed.
Keeping this open as an enhancement to consider improving parsing of parameters, and as an alternative to #1119.