Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#2283 closed defect (invalid)

RFI on HTTP/2 connection memory usage and keepalive_requests knob

Reported by: bhassink@… Owned by:
Priority: minor Milestone:
Component: documentation Version: 1.19.x
Keywords: Cc:
uname -a: n/a
nginx -V: n/a

Description

A while back I integrated an in-house HTTP2 proxy with an NGINX powered site, and was surprised by connection drops per the configuration of http2_max_requests (now keepalive_requests).

What is it about NGINX internals that "leaks" memory for every request until the connection is closed, and will it be fixed?

One of the design aspects behind HTTP2 is enabling long-lived connections, but this NGINX limitation defeats that.

Change History (5)

comment:1 by Maxim Dounin, 2 years ago

Resolution: invalid
Status: newclosed

In nginx, memory allocations are bound to a particular memory pool, and all the allocated memory allocations are freed once the pool is destroyed (see development guide for additional details). Connections have their corresponding memory pools, and they needs to be closed periodically to free these memory allocations.

In current code all connection-specific memory allocations are expected to be limited, and no memory growth should be observed with many requests in a single connection (last corresponding fixes were in 1.21.1, see CHANGES). This is not, however, something guaranteed. To prevent potential memory growth and, more importantly, potential abuse of such a memory growth, it is important to close connections periodically.

Note well that periodically closing connections is also important from observability point of view, and might be needed to prevent DoS attacks by using tools external to nginx. Further, it might be important for better balancing connections between worker processes.

While long-lived connections are indeed one of the HTTP/2 design aspects, it is also important to consider potential DoS mitigation. It is believed that current defaults (keepalive_requests 1000, keepalive_timeout 75s, keepalive_time 1h) provide a good balance between performance and security. If in your case some larger values will work better, you are free to use them (note the possible implications though).

Hope this helps. If you have further questions about nginx, consider using support options available.

comment:2 by bhassink@…, 2 years ago

I would agree that long-lived connections are problematic with respect to load balancing, but I'm not sure I see how the keepalive mechanisms are helpful towards DoS mitigation.

A flood of connections and requests within those connections is still going to consume resources, with auto-closing leading to expensive TLS handshakes on the subsequent connections. Is there some documentation and/or data that shows how auto-closing helps?

comment:3 by Maxim Dounin, 2 years ago

Closing connections means that opening new connections can be observed externally, for example, by a firewall. So the firewall will be able to see the attack and stop it.

comment:4 by bhassink@…, 2 years ago

Ok, I can see how observability helps in a flooding situation. Will it be effective in a more "patient" attack?

For example, the keepalive settings can be determined with a bit of probing. Using the defaults of 1000/75s/1h, distributed hosts could open connections, send a burst of 953 requests, and then a single request every 75s thereafter for 1h, and repeat on closure.

Because request allocations are not fully-reclaimed, those allocations can be maxed out with this type of attack can they not?

comment:5 by Maxim Dounin, 2 years ago

First of all, you are mixing two independent things:

  1. Connection-related memory cleanup on connection close. Keepalive request limits ensure that memory usage by a connection is limited, even if there are some per-request memory allocations from the connection pool.
  2. External observability to make it possible to stop various DoS attacks by external tools. It can be used to stop many types of DoS attacks, trying to use various DoS vectors, not necessary targeting connection-related memory allocations.

Attacks on connection-related memory allocations are unlikely practical with the current limits, since there are limits. For example, let's assume there are 1k of connection-related memory allocations per request.

Without the limits this means that an attacker can do, for example, 1 million of requests and this will result in 1g of memory being allocated. This is quite noticeable for many setups (and nothing can stop the attacker from making even more requests).

With the 1000 keepalive_requests limit, it means that a single connection cannot allocate more than 1m of memory - which is comparable to typical per-connection memory usage in most setups (especially when using HTTP/2). That is, such an "attack" would be no worse than simply opening a connection. While the attacker could try to make the attack as effective as possible within the limits, it's still not expected to be effective enough to be seriously considered.

Note: See TracTickets for help on using tickets.