Opened 20 months ago

Last modified 20 months ago

#2391 new enhancement

bad parsing of Content-Type (sub_filter_types)

Reported by: ticket.mmisolution.be@… Owned by:
Priority: minor Milestone:
Component: nginx-module Version: 1.18.x
Keywords: mime sub_filter_types Content-Type Cc:
uname -a: Linux AUXIDMZ-21001 5.10.0-17-amd64 #1 SMP Debian 5.10.136-1 (2022-08-13) x86_64 GNU/Linux
nginx -V: nginx version: nginx/1.18.0
built with OpenSSL 1.1.1n 15 Mar 2022
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -ffile-prefix-map=/build/nginx-QeqwpL/nginx-1.18.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-compat --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_sub_module

Description

Hello,

I think wefound a bug in the way the Content-Type is parsed in order to match sub_filter_types.

The problem was an href line wasn't rewritten.
After analysibng a lot of the html flux,
We find that the Content -Type line is as follow :

Content-Type: text/html; charset:UTF-8; charset=UTF-8

It seems that nginx parses such lines as (.*); *charset=(.*)
because we had to put "text/html; charset:UTF-8" as the type in sub_filter_types to make it work (eg:

sub_filter_types "text/html; charset:UTF-8" script/js text/html text/css text/xml ;

It's not an urgent issue, as we found a solution (an ugly one);
but maybe you woulld be interested to know about the issue, and improve it.

Thanks

Change History (1)

comment:1 by Maxim Dounin, 20 months ago

Keywords: mime added
Type: defectenhancement

Parsing of content types is indeed very simple, and it only knows about one possible parameter, charset, and assumes it the last parameter and it is specified correctly, see ngx_http_upstream_copy_content_type(). Other parameters, if any, currently expected to be used explicitly in the types filtering, such as in charset_types, ssi_types, sub_filter_types, xslt_types, addition_types, and gzip_types - this is basically what you've done in your workaround configuration.

While it might make sense to improve parsing (notably, to consistently support cases where charset isn't the last parameter) and/or types matching (see ticket #1119), note that in the particular example as provided in this ticket proper parsing would result in an error, since charset:UTF-8 is not a valid content type parameter (see RFC 9110).

That is, in this particular case it might be a good idea to fix the backend and make sure it generates valid Content-Type instead of the invalid one it currently returns. This will also ensure that no nginx-side workarounds will be needed.

Keeping this open as an enhancement to consider improving parsing of parameters, and as an alternative to #1119.

Note: See TracTickets for help on using tickets.