Opened 15 months ago

Last modified 15 months ago

#2482 new enhancement

* is not evaluated to ::

Reported by: wanneut@… Owned by:
Priority: major Milestone:
Component: nginx-core Version: 1.19.x
Keywords: Cc:
uname -a: Linux many
nginx -V: many

Description (last modified by wanneut@…)

asterisk (*) should be a wildcard for IPv6 [::] and IPv4 0.0.0.0 not Ipv4 only.
This is especially important since you stopped adhering to the global configuration and instead defaulting to ipv6only=on.
So users configure the system to do automatic dualstack, writing a dual stack address and nginx binds just IPv4 only.
This broke the dual stack support of more or less all the software that bundles nginx like gitlab and a lot of smaller webservers.
People especially on Linux are expecting dual stack to work out of the box without additional configuration.And since there is a fallback to IPv4 many even do not recognize that they have a problem.

Change History (8)

comment:1 by wanneut@…, 15 months ago

Description: modified (diff)

comment:2 by Maxim Dounin, 15 months ago

In no particular order:

  • This is practically doable (and mostly trivial) since nginx 1.15.10, as we already support resolving a hostname in the listen directive to multiple addresses.
  • The bare port specification should be handled identically. That is, listen 80; should be the same as listen *:80;. If the meaning of the * is changed to mean both IPv4 and IPv6 wildcard addresses, listen 80; should mean listen on both IPv4 and IPv6 addresses as well.
  • This is a major change, which will break many existing configurations. If at all, it should be done with care.
  • The * character is traditionally used to mean IPv4 wildcard, and not IPv6. For an example, see bind docs: "An ip_address of * (asterisk) is interpreted as the IPv4 wildcard address; connections are accepted on any of the system’s IPv4 addresses."
  • I don't really like the idea of * being magically dependant on the compilation options and system configuration. While it might be not critical nowadays, when IPv6 is usually available and compiled in, it used to cause a lot of confusion previously. And that's one of the reasons why we turn on ipv6only by default since nginx 1.3.4, see bdcdbdf35b52.
  • If both IPv4 and IPv6 listens are needed, it is currently trivial to configure it explicitly, either with distinct listen directive, or by using ipv6only=off (not recommended, especially on Linux), or by using a name which resolves to both wildcard addresses.

Keeping this open for now, though overall I'm sceptical about the proposed change.

comment:3 by wanneut@…, 15 months ago

The * character is traditionally used to mean IPv4 wildcard, and not IPv6.

I don't know what you mean with traditionally. 1982 this was true, since there was no IPv6. But it had always the meaning of a wildcard. Not as a synonym for IPv4.
So for almost all the software I know it is a synonym for both.
OpenSSH: * stands for ::1 and 127.0.0.1 defaults to ::
apache2: * stands for :: has no default (I thinkg at least)
unbound: * stands for 0.0.0.0 defaults to :: and ::1
...

I do not think that there is any similar big and actively maintained non-python software like nginx, that is defaulting to IPv4 only even if you listen on * or ::. Everybody makes possible to configure dualstack in one line instead of having to configure IPv4 and IPv6 separately.

If both IPv4 and IPv6 listens are needed, it is currently trivial to configure it explicitly,

Yes. With the ipv6only decision you more or less descidet to send 90% of your users back to the 90s and killing dualstack support. (Because by far the most people use defaults.)

While it might be not critical nowadays, when IPv6 is usually available and compiled in, it used to cause a lot of confusion previously.

Since Windows Vista it is possible to disable IPv4 but not IPv6. Since you now try to mimic the Windows behavior on every OS, you should focus that defaults work on single stack IPv6. Not on single stack IPv4.

not recommended, especially on Linux

Why not? I know literally no other software that behaves like nginx in that case. It makes things much easier. Adding a IP address to an interface (or getting it though RA) is easy. Manually enabling IPv6 for dozens of programs is for most of the users just not worth the effort. And for users it is much easier to configure things on one place for all programs. (Since they do not care about other machines.) Its only an advantage for developers when the software ignores settings to mimic behavior of other operating systems.
The feature was explicitly added that users that do not care about network configuration get best connectivity. You purposely killed that.

Last edited 15 months ago by wanneut@… (previous) (diff)

comment:4 by Maxim Dounin, 15 months ago

I don't know what you mean with traditionally. 1982 this was true, since there was no IPv6. But it had always the meaning of a wildcard. Not as a synonym for IPv4.

Well, not really. It used to be an PF_INET (internet protocol) wildcard when the "internet protocol" was equivalent to IPv4. And other protocols are configured differently. Since IPv4 and IPv6 are different protocols, it's an open question whether it should include IPv6 or not.

So for almost all the software I know it is a synonym for both.
OpenSSH: * stands for ::1 and 127.0.0.1 defaults to ::

In OpenSSH, ListenAddress has no * alias.

apache2: * stands for :: has no default (I thinkg at least)

In Apache2, * is not documented, though supported as an equivalent for port-only specification (which uses getaddrinfo(AI_PASSIVE) on modern systems, and results depend on the system configuration and Apache2 compilation options).

unbound: * stands for 0.0.0.0 defaults to :: and ::1

In unbound, * is not documented either. Example configurations explicitly list interface: 0.0.0.0 and interface: ::0 as a way to listen on all interfaces.

not recommended, especially on Linux

Why not?

Because Linux tries to prevent address conflicts between listening sockets, and rejects listens on specific IP addresses if a wildcard listen exists. For explicitly specified wildcard addresses nginx avoid this by matching non-wildcard addresses to wildcard ones and using getsockname() call to properly route connections, see description of the bind parameter of the "listen" directive. With listen [::]:80 ipv6only=off; attempts to listen on explicit IPv4 addresses, such as in listen 127.0.0.1:80;, will fail on Linux. Therefore a better approach would be to explicitly configure both IPv4 and IPv6 wildcard addresses instead.

Overall, thank you for your opinion. While it might be a good idea to simplify configuring dual stack servers, there multiple factors to consider.

comment:5 by wanneut@…, 15 months ago

So 3 pieces of software do (for different reasons) something without documenting it one does something different and is documenting it.
Can it be that it is just obvious to do the first thing so that only software, that doesn't do it that way is documenting it?

While it might be a good idea to simplify configuring dual stack servers, there multiple factors to consider.

The thing is: It should not be simplified to configure dualstack. There can be a discussion if nginx should default on IPv6, dualstack or the old behavior (which was dualstack). But changing the default form dualstack to the deprecated IPv4 even if the user configured it the other way on system side is absolutely shitty.

and results depend on the system configuration and Apache2 compilation options

Yes. And adhering configurations should be the default. All the time. You can add options overwriting system configurations. But having a software that resolves * to 0.0.0.0 while the system resolve * to :: is equivalent shitty as a software resolving nginx.org to 34.199.147.133 because the developer thinks apache on AWS would be a better solution than nginx.

and rejects listens on specific IP addresses if a wildcard listen exists

And I think this is a good thing. Usually this is a configuration error. You can use reuseaddr if you are sure that you do not like this behavior.
Again the default is what most people need. You can do different things by adding configuration.

Last edited 15 months ago by wanneut@… (previous) (diff)

comment:6 by Maxim Dounin, 15 months ago

So 3 pieces of software do (for different reasons) something without documenting it one does something different and is documenting it.

So 3 out of 3 software products you've listed as examples (of software products which supposedly interpret * differently from nginx) do not officially support *. And only one of them does so unofficially. The only example software product mentioned so far which officially supports * is bind, and it does so exactly as nginx does.

Overall, the net effect of the examples you've provided is the fact that any examples you provide cannot be trusted. You may want to reconsider your approach to the discussion.

The thing is: It should not be simplified to configure dualstack. There can be a discussion if nginx should default on IPv6, dualstack or the old behavior (which was dualstack). But changing the default form dualstack to the deprecated IPv4 even if the user configured it the other way on system side is absolutely shitty.

Well, nginx never defaulted to dualstack in the first place.

It looks like you are trying to mix multiple topics here, notably:

  • the * alias behaviour,
  • IPV6_V6ONLY socket option usage on IPv6 sockets,
  • and default listening sockets being used if no listening sockets are specified.

Default listening sockets are not really relevant here. In almost any meaningful nginx configuration one is expected to configure listening sockets anyway, as one is almost certainly have to support more than just plain HTTP over TCP on a port which depends on the startup user. If default listening sockets are to be discussed, the main question would be how to remove them with minimal impact.

And the IPV6_V6ONLY socket option is proven to be bad from usability point of view in anything but dead simple configurations. That's why nginx disables it on IPv6 sockets to make sure it is never used unless explicitly specified.

The remaining question is the * interpretations, which is IPv4 wildcard in nginx. For a typical dualstack site this means one have to use two listen directives in a typical server{} block (assuming distinct HTTP and HTTPS server blocks). By interpreting it as both IPv4 and IPv6 wildcards as you suggest we can reduce this to just one listen directive.

So the question is: is it worth the change, or the negative factors prevail.

Yes. And adhering configurations should be the default. All the time. You can add options overwriting system configurations. But having a software that resolves * to 0.0.0.0 while the system resolve * to :: is equivalent shitty as a software resolving nginx.org to 34.199.147.133 because the developer thinks apache on AWS would be a better solution than nginx.

The problem is: this means that * is interpreted differently in different cases, and a configuration can be silently broken if a system configuration or compilation options change. Using explicit configuration instead ensures that nginx is able to detect issues and complain.

For example, if you have proxying configured to an IPv6 address, but accidentally broke IPv6 support during compilation, using explicit addresses ensures that nginx will be able to complain during configuration testing that it cannot listen on IPv6 addresses without IPv6 support compiled in. In contrast, with * being used, assuming it is interpreted as "all protocols wildcard", it will silently accept the configuration and will simply do the wrong thing.

and rejects listens on specific IP addresses if a wildcard listen exists

And I think this is a good thing. Usually this is a configuration error. You can use reuseaddr if you are sure that you do not like this behavior.

Well, regardless of whether you think it is good or not, it contradicts the general concept of how BSD sockets work and breaks valid configurations. And it is certainly not expected to do so, especially within a single server. Even if this behaviour can be overruled by an unrelated socket options, which permits much more than just using more specific listening sockets.

Either way, as explained, nginx implements a workaround for this Linux behaviour, but this workaround will not work with ipv6only=off sockets, leading to issues. Therefore, using ipv6only=off is not recommended. Configuring dualstack servers with distinct IPv4 and IPv6 sockets generally works better and with less surprises.

comment:7 by wanneut@…, 15 months ago

The problem is: this means that * is interpreted differently in different cases, and a configuration can be silently broken if a system configuration or compilation options change.

The same is true for localhost, that alone in debian changed from 127.0.0.1 to ::1 to both to 127.0.0.1. And for any other name. You are also not changing tcp from cubic to Tahoe or require the destination MAC-Address to be hard configured to have consistent behavior over all platforms. At some point you have to let the OS make its job and have to rely on existing configurations like gateways, name resolution and tcp options.

So 3 out of 3 software products you've listed as examples (of software products which supposedly interpret * differently from nginx) do not officially support *.

No they do. They just let the resolution made by the operating system. Like you do it for all other names except *.

it contradicts the general concept of how BSD sockets

FreeBSD is doing it the same way.

Well, regardless of whether you think it is good or not,

Well regardless what YOU think, when Linux (and BSD) implemented this feature (and gave configuration to change it) they had reasons. They are not plain stupid. (So did Windows when they didn't. But breaking it even if the user explicitly turned it on, is just awful. I hope you are not start shipping suid binaries to get a workaround that linux is not allowing you to bind on port 80 if you have no root rights.

It looks like you are trying to mix multiple topics here

Yes, because the combination of these 3 things result in very stupid defaults.
There is no matter if you interpret * as all interfaces, 0.0.0.0+:: or just as a name (which is resolved to ::).
In all cases it results in a proper dualstack setup. So this is what people are expecting from it.
But by setting IPV6_V6ONLY just resolving it breaks IPv4 only setups (which I assume you will never like to do) an by setting it as default, you killed all the installations that use defaults.

and breaks valid configurations.

The main thing is: Your behavior breaks valid configurations. And not these where the user configured something what does not comply with its operating system rules, but all the ones that just uses defaults.

In short defaulting on ipv6only=on and resolve * to :: would be a proper solution. Like Linux does.
Resolving * over the c-library and letting the OS decide if ipv6only is on or not a even better one.
Killing the * and just defaulting 0.0.0.0 and :: while doing ipv6only=off also valid. (More or less the Windows way.)
But defaulting on * and letting it resolve to 0.0.0.0 while the operating system translate it to :: is just plain wrong.

in reply to:  7 comment:8 by Maxim Dounin, 15 months ago

The problem is: this means that * is interpreted differently in different cases, and a configuration can be silently broken if a system configuration or compilation options change.

The same is true for localhost, that alone in debian changed from 127.0.0.1 to ::1 to both to 127.0.0.1. And for any other name.

And that's basically why I cannot recommend using localhost in configurations either. But * is not a name, but a hardcoded alias to a whildcard IP address in the programs which support it, such as nginx.

So 3 out of 3 software products you've listed as examples (of software products which supposedly interpret * differently from nginx) do not officially support *.

No they do. They just let the resolution made by the operating system. Like you do it for all other names except *.

The * as resolved by the system is expected to returns no addresses.

The only exception I know is glibc when used with getaddrinfo(), which specifically translates * to NULL, so it happen to work when the service is set, see here. With AI_PASSIVE specified it is essentially equivalent to what Apache does, but glibc-specific. In particular, on Linux with glibc sshd happens to map * into 127.0.0.1 and ::1 specifically because it uses getaddrinfo() with service specified, but doesn't set AI_PASSIVE.

On other operating systems and even on Linux with, for example, Musl libc, trying to use * will simply fail. Implementing a glibc-only behaviour is certainly not something to consider in a portable program.

it contradicts the general concept of how BSD sockets

FreeBSD is doing it the same way.

Ah, sorry, I've misread your suggestion to use reuseaddr as a suggestion to use reuseport. Using SO_REUSEADDR simply won't help on Linux (and nginx does it anyway).

On FreeBSD, you can create listening sockets with different addresses by using the SO_REUSEADDR option, which allows to create listening sockets on the same port as long as addresses are different. And this socket option is used by all viable servers, including nginx, as it is anyway needed to start when there are TIME_WAIT sockets left from the previous runs.

On Linux, however, SO_REUSEADDR is not enough to create listening sockets with different "conflicting" addresses, such as 0.0.0.0 and 127.0.0.1. For example, the following works on FreeBSD:

$ perl -e 'use IO::Socket::INET; my $s = IO::Socket::INET->new(LocalAddr => "0.0.0.0:8080", Listen => 1, ReuseAddr => 1) or d
ie; sleep
$ perl -e 'use IO::Socket::INET; my $s = IO::Socket::INET->new(LocalAddr => "127.0.0.1:8080", Listen => 1, ReuseAddr => 1) or die; sleep 100;' &
$ netstat -Lan | grep 8080
tcp4  0/0/1                            127.0.0.1.8080         
tcp4  0/0/1                            *.8080                 
$ 

But not on Linux:

$ perl -e 'use IO::Socket::INET; my $s = IO::Socket::INET->new(LocalAddr => "0.0.0.0:8080", Listen => 1, ReuseAddr => 1) or die; sleep 100;' &
[1] 1831
$ perl -e 'use IO::Socket::INET; my $s = IO::Socket::INET->new(LocalAddr => "127.0.0.1:8080", Listen => 1, ReuseAddr => 1) or die; sleep 100;' &
[2] 1832
$ IO::Socket::INET: Address already in use	...propagated at -e line 1.

[2]+  Exit 98                 perl -e 'use IO::Socket::INET; my $s = IO::Socket::INET->new(LocalAddr => "127.0.0.1:8080", Listen => 1, ReuseAddr => 1) or die; sleep 100;'
$ netstat -an | grep 8080
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN     
$ 

On Linux, such sockets can be only opened with SO_REUSEPORT, which is a socket option for load balancing between multiple sockets, available in Linux 3.9 and newer. And it allows completely duplicate listens, which is much more dangerous behaviour.

Well, regardless of whether you think it is good or not,

Well regardless what YOU think, when Linux (and BSD) implemented this feature (and gave configuration to change it) they had reasons. They are not plain stupid. (So did Windows when they didn't. But breaking it even if the user explicitly turned it on, is just awful. I hope you are not start shipping suid binaries to get a workaround that linux is not allowing you to bind on port 80 if you have no root rights.

It looks like you've lost the context. The particular phrase was about Linux preventing listens on both 0.0.0.0 and 127.0.0.1 (even with the SO_REUSEADDR socket option set).

It looks like you are trying to mix multiple topics here

Yes, because the combination of these 3 things result in very stupid defaults.
There is no matter if you interpret * as all interfaces, 0.0.0.0+:: or just as a name (which is resolved to ::).
In all cases it results in a proper dualstack setup. So this is what people are expecting from it.
But by setting IPV6_V6ONLY just resolving it breaks IPv4 only setups (which I assume you will never like to do) an by setting it as default, you killed all the installations that use defaults.

and breaks valid configurations.

The main thing is: Your behavior breaks valid configurations. And not these where the user configured something what does not comply with its operating system rules, but all the ones that just uses defaults.

In short defaulting on ipv6only=on and resolve * to :: would be a proper solution. Like Linux does.
Resolving * over the c-library and letting the OS decide if ipv6only is on or not a even better one.
Killing the * and just defaulting 0.0.0.0 and :: while doing ipv6only=off also valid. (More or less the Windows way.)
But defaulting on * and letting it resolve to 0.0.0.0 while the operating system translate it to :: is just plain wrong.

To add some facts:

  • Resolving * with OS will return no addresses except on Linux with glibc.
  • For more fun, Linux with glibc in getaddrinfo(AI_PASSIVE) (assuming AF_UNSPEC, of course) resolves * to :: and 0.0.0.0, and one won't be able to open such listening sockets simultaneously on Linux without disabling IPV6_V6ONLY.

Also, as previously suggested, please forget about default listening sockets. They are not expected to be used in real configurations. Simply assume there are no default listening sockets, and listening socket must be explicitly configured.

Overall, from your explanation I still fail to see what's wrong with using

listen *:80;
listen [::]:80;

in server blocks when dual-stack configuration is needed. It is expected to work on all hosts where you can use listen on ::, so it is essentially equivalent to your suggestions.

Such configuration will fail if IPv6 support is completely unavailable, that is, not compiled in at all, though all your suggestions will equally fail in this case.

Note: See TracTickets for help on using tickets.