Opened 3 years ago
Closed 3 years ago
#2293 closed defect (worksforme)
Nginx 1.20.1 excessive memory usage
Reported by: | Owned by: | ||
---|---|---|---|
Priority: | major | Milestone: | |
Component: | nginx-core | Version: | 1.19.x |
Keywords: | Cc: | ||
uname -a: |
bash-4.2$ uname -a
Linux tenant-router-7b95bc584c-s6mcx 5.4.17-2102.205.7.3.el7uek.x86_64 #2 SMP Fri Sep 17 16:52:13 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux |
||
nginx -V: | nginx version: nginx/1.20.2 |
Description (last modified by )
Running Nginx 1.20.x with HTTP/2 traffic 225req/s, see memory usage keep increasing. In couple hours it reached 1.5GB and the memory never released.
We rolled back to 1.19.1 and memory is much better. But still slow memory leak as traffic runs.
The nginx.conf in 1.20.1 we have to replace http2_max_requests to use keep_alive due to the detective obsolete.
We have:
proxy_buffer off
proxy_cache off
From meminfo we still see cache usage is growing as traffic runs.
config for 1.20.x is attached.
Change History (11)
comment:1 by , 3 years ago
Description: | modified (diff) |
---|
comment:2 by , 3 years ago
First of all, please elaborate on how do you measure memory usage. Additionally, please show "nginx -V" output and keepalive_requests
setting in your configuration.
From meminfo we still see cache usage is growing as traffic runs.
Note that "Cached" in meminfo is memory in the pagecache, and expected to grow under load per OS caching algorithms, regardless of whether you use nginx cache or not.
comment:3 by , 3 years ago
For nginx 1.20.1, we configure:
keepalive_requests 10000;
http2_max_concurrent_streams 1024;
We run Nginx as pods in Kubernetes, we were monitoring the memory from Pods memory usage.
In 1.19.1 We configure:
http2_max_requests 10000;
http2_max_concurrent_streams 1024;
comment:4 by , 3 years ago
keepalive_requests 10000;
You may want to check if "keepalive_requests 1000;" (which is the default in 1.20.x) makes any difference.
We run Nginx as pods in Kubernetes, we were monitoring the memory from Pods memory usage.
How exactly do you monitor memory usage of pods? What makes you think that the memory usage is from nginx, and not OS activity such as the pagecache mentioned earlier?
Also, sorry to repeat it again, but please show "nginx -V" output.
follow-up: 6 comment:5 by , 3 years ago
Thanks for getting back!
bash-4.2$ nginx -V nginx version: nginx/1.20.1 built by gcc 4.8.5 20150623 (Red Hat 4.8.5-44.0.3) (GCC) built with OpenSSL 1.0.2k-fips 26 Jan 2017 TLS SNI support enabled configure arguments: --with-cc-opt='-fstack-protector-all' --with-ld-opt='-Wl,-z,relro,-z,now' --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --pid-path=/tmp/nginx.pid --http-log-path=/dev/stdout --error-log-path=/dev/stdout --http-client-body-temp-path=/tmp/client_temp --http-proxy-temp-path=/tmp/proxy_temp --http-fastcgi-temp-path=/tmp/fastcgi_temp --http-uwsgi-temp-path=/tmp/uwsgi_temp --http-scgi-temp-path=/tmp/scgi_temp --with-file-aio --with-http_v2_module --with-http_ssl_module --with-http_stub_status_module --with-pcre --with-stream --with-stream_ssl_module --with-threads
I will try "keepalive_reuqests 1000"
We monitor container memory usage which is just Nginx. And the only change is version 1.19.1 and 1.20.1. Can you recommend a way to monitor Nginx memory usage?
Thanks!
comment:6 by , 3 years ago
Replying to luyangliang@…:
Thanks for getting back!
bash-4.2$ nginx -V nginx version: nginx/1.20.1 built by gcc 4.8.5 20150623 (Red Hat 4.8.5-44.0.3) (GCC) built with OpenSSL 1.0.2k-fips 26 Jan 2017 TLS SNI support enabled configure arguments: --with-cc-opt='-fstack-protector-all' --with-ld-opt='-Wl,-z,relro,-z,now' --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --pid-path=/tmp/nginx.pid --http-log-path=/dev/stdout --error-log-path=/dev/stdout --http-client-body-temp-path=/tmp/client_temp --http-proxy-temp-path=/tmp/proxy_temp --http-fastcgi-temp-path=/tmp/fastcgi_temp --http-uwsgi-temp-path=/tmp/uwsgi_temp --http-scgi-temp-path=/tmp/scgi_temp --with-file-aio --with-http_v2_module --with-http_ssl_module --with-http_stub_status_module --with-pcre --with-stream --with-stream_ssl_module --with-threads
Thanks. Is aio on;
, aio threads;
, aio_write on;
are actually used in the configuration? Any additional modules loaded via the load_module
directive in the configuration?
I will try "keepalive_reuqests 1000"
You may also review other settings affected by the version change, notably http2_recv_timeout
, http2_idle_timeout
, and keepalive_timeout
.
Also, could you please show various buffers and connection limit settings to estimate memory usage, notably worker_processes
, worker_connections
, proxy_buffer_size
, proxy_buffers
(and/or grpc_buffer_size
, grpc_buffers
if gRPC proxying is used, and/or relevant fastcgi/scgi/uwsgi values).
We monitor container memory usage which is just Nginx. And the only change is version 1.19.1 and 1.20.1. Can you recommend a way to monitor Nginx memory usage?
Usually monitoring the VIRT column in top
for nginx processes is good way to monitor nginx memory usage. The RES column might be also interesting. Note that these numbers are expected to grow when nginx starts, yet are expected to stabilize under load after some time.
If you see it still growing at the same rate, e.g., after a couple of hours, this might indicate that there is something wrong - that is, some memory or a socket leak. If it stabilizes instead, this likely means it reached the size corresponding the the load and the configuration.
It is also might be a good idea to monitor number of connections (active/reading/writing) as reported by the stub_status module (and/or OS). These can greatly simplify understanding what goes on.
follow-up: 8 comment:7 by , 3 years ago
Setting keepalive_requests to 1000 actually improves a lot. Memory does stabilized after couple hours with same traffic rate.
The Client side starts with 4 connections and send requests using the round-robin. We used to set keepalive_request to 10000 to match client side setting. Because if Nginx release the connection when max request is hit, Client side receives a 503 (while using the expired connection). In order to avoid 503, we set client side max request per connection also to 10000. So that client disconnect first to avoid 503. Any suggestion how to handle this? Default 1000 is a little low under traffic load.
We don't have any other load_module. aio on;, aio threads;, aio_write on;are not used in nginx.conf
This are the other related configuration:
events { use epoll; worker_connections 1024; multi_accept on; } http { # Required: Headers with underscores underscores_in_headers on; # Recommended: Tuning (Unofficial) proxy_busy_buffers_size 256k; proxy_buffers 4 256k; proxy_buffer_size 128k; proxy_read_timeout 300s; client_max_body_size 10m; server_names_hash_bucket_size 256; variables_hash_bucket_size 256; sendfile on; keepalive_timeout 120; keepalive_requests 1000; //Just updated to 1000 server { .... http2_max_concurrent_streams 1024; proxy_buffering off; .... } }
comment:8 by , 3 years ago
Replying to luyangliang@…:
Setting keepalive_requests to 1000 actually improves a lot. Memory does stabilized after couple hours with same traffic rate.
So it looks like it's just a question of memory consumed in your configuration, not a leak. You may want to try using larger keepalive_requests
to see if it stabilizes as well, probably at some larger memory size.
It would be also interesting to compare memory used and the number of active connections, to see if used memory scales linearly from the number of connections or memory used per connection depends on keepalive_requests
(that is, there are some per-request allocations).
Because if Nginx release the connection when max request is hit, Client side receives a 503 (while using the expired connection).
Note that this means that client probably needs improvements.
This are the other related configuration:
worker_connections 1024; proxy_buffers 4 256k; proxy_buffer_size 128k; http2_max_concurrent_streams 1024; proxy_buffering off;
So the configuration allows up to 1 million of parallel requests per worker (worker_connections
* http2_max_concurrent_streams
), and at least up to 128k of memory per request (proxy_buffer_size
, given that proxy_buffering
is switched off). This gives 128 gigabytes maximum memory usage per worker process, which is clearly not reached.
BTW, what's the body_buffer_size
value? One of the changes in HTTP/2 code in 1.19.x is that it is now more likely to use this buffer. If it's set to a large value, this might explain the change in overall memory usage you observe.
comment:9 by , 3 years ago
body_buffer_size we did not configure, so it is using default 16k.
This nginx supports up to 10 customers. Traffic rate we should support more than 2250req/s. Each customer has Clients from both browsers and premise traffic generator. I have tried decreased proxy_buffer_size but some of the browser request would failed (due to large cookie size).
I will reduce http2_max_concurrent_streams since that is per connection.
We still don't understand the large memory usage difference between 1.19 and 1.20 when use 10k keepalive_requests.
comment:10 by , 3 years ago
body_buffer_size we did not configure, so it is using default 16k.
Err, sorry, client_body_buffer_size is certainly unrelated, as relevant changes are only in 1.21.x, not in 1.19.x.
We still don't understand the large memory usage difference between 1.19 and 1.20 when use 10k keepalive_requests.
My best guess is that the memory difference is due to different number of connections being open in these versions. This might be due to various settings unification between HTTP/1.x and HTTP/2. For example, http2_recv_timeout
was 30s by default, and was replaced with client_header_timeout
, which is 60s by default. It would be interesting to check number of connections being open in different versions, and memory per connection metrics.
Another important change in 1.19.x branch which also might affect number of connections being open is lingering close introduction for HTTP/2, but it was already present in 1.19.1. There were some related SSL shutdown changes in subsequent versions though, which might affect client behaviour.
comment:11 by , 3 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
Feedback timeout, so closing this. As previously found out in comments, clearly there is no memory leak, and the difference in memory consumption in the particular configuration is likely explained by the different number of connections being kept alive in different versions due to configuration changes.
Can't attach configuration. I will share that when it is needed.