#420 closed enhancement (wontfix)
keepalive_disable: bingbot (to prevent DoS)
Reported by: | Steffen Weber | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | nginx-core | Version: | 1.3.x |
Keywords: | Cc: | ||
uname -a: | |||
nginx -V: |
nginx version: nginx/1.4.1
TLS SNI support enabled configure arguments: --prefix=/usr --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error_log --pid-path=/run/nginx.pid --lock-path=/run/lock/nginx.lock --with-cc-opt=-I/usr/include --with-ld-opt=-L/usr/lib --http-log-path=/var/log/nginx/access_log --http-client-body-temp-path=//var/lib/nginx/tmp/client --http-proxy-temp-path=//var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=//var/lib/nginx/tmp/fastcgi --http-scgi-temp-path=//var/lib/nginx/tmp/scgi --http-uwsgi-temp-path=//var/lib/nginx/tmp/uwsgi --with-ipv6 --with-pcre --with-pcre-jit --without-http_autoindex_module --without-http_browser_module --without-http_geo_module --without-http_limit_conn_module --without-http_map_module --without-http_memcached_module --without-http_scgi_module --without-http_ssi_module --without-http_split_clients_module --without-http_upstream_ip_hash_module --without-http_userid_module --without-http_uwsgi_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_secure_link_module --with-http_spdy_module --with-http_stub_status_module --with-http_sub_module --with-http_realip_module --with-http_ssl_module --without-mail_imap_module --without-mail_pop3_module --without-mail_smtp_module --user=nginx --group=nginx |
Description
Like its predecessor msnbot, bingbot continues to DoS websites. I've given up hope that Microsoft will do anything about this since it has been happening since years. Just google for "msnbot dos" or "bingbot dos".
In my case, hundreds of connections are kept open by a single bingbot IP address (state "TIME_WAIT" or "ESTABLISHED xxxx/nginx: worker"). This causes my iptables DoS protection to kick in and ban bingbot.
For me, the volume of the requests is no problem, but hundreds of parallel connections _are_ a problem. Would it be possible to add "bingbot" as an allowed value for the "keepalive_disable" directive? It should match all user agents that contain "bingbot" or "msnbot".
Change History (5)
comment:1 by , 11 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:2 by , 11 years ago
The number of requests sent by bingbot is no problem. Only the high number of idling connections are a problem because both possible mitigations (the nginx module "limit_conn" or the iptables module "hashlimit") cause bingbot requests to be denied. This is not what we want.
We want to serve all requests sent by bingbot but without the high number of idling connections. I understand that this is an issue of bingbot but unfortunately we have to live with it and the only solution I can think of (if you don't want to outright deny bingbot access to your website or use an extremely high "crawl-delay") is to disable keepalive.
comment:4 by , 11 years ago
The limit_conn does _not_ cause requests to be denied based on number of idle keepalive connections. It only counts connections with requests currently being processed, see docs.
comment:5 by , 11 years ago
I would like to limit the total number of connections in the firewall and not only the number of "active" connections in nginx. The reason is that even idling connections consume server resources (e.g. ports). And limiting the number of connections per IP can help to mitigate DoS attacks.
Unfortunately, with bingbot using 100+ connections per IP address, I have to configure my firewall to allow a very high number of connections per IP address such that the DoS protection is less effective than it could be.
No other browser, proxy or bot is using such a high number of connections. That's why I had hoped that nginx could "fix" bingbot by denying keepalive.
To prevent DoS, use limit_conn module. Or, in case of Bing, just use something like
in robots.txt which used to help a lot. Attempts to reduce number of keepalive connections are mostly pointless - they are very cheap for nginx to maintain (read: a single host can easily maintain tens of thousands keepalive connections), and they are reused if nginx hits worker_connections limit.