[Koha] how to avoid high cpu uses due to web crawlers
Wagner, Alexander
alexander.wagner at desy.de
Wed Jan 10 22:19:55 NZDT 2024
Hi!
> I found that an IP 47.76.35.19 is hitting my opac continuously, due to
> which CPU use is very high, and it makes the entire Koha opac and staff
> client very slow.
This does not look like a legit crawler. So most likely you can't tackle this guy with a robots.txt as most likely it will not respect it anyway.
> I am also not able to locate the file .htaccess in mu ubuntu 18.04 with
> koha 20.04
> Can anyone how to resolve this?
`.htaccess` files do not exist by default, you'd have to create it in the appropriate place with proper permissions and ownerships using your favourite text-editor. They are basically folder based firewall rules read by your webserver. IOW you could either use those or have a rule in your apache configs.
I am no expert in either but on one of our current (non-koha)-systems we use something like
```
# Turn badips away
RewriteMap hosts-deny "txt:/opt/invenio/var/tmp/hosts-deny.txt"
RewriteCond "${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}" "!=NOT-FOUND" [OR]
RewriteCond "${hosts-deny:%{HTTP:X-Forwarded-For}|NOT-FOUND}" "!=NOT-FOUND"
RewriteRule .* - [R=429,L]
```
in the apache configs. This refers to a txt-file in this case in some funny path `/opt/invenio/var/tmp/` called `hosts-deny.txt` that lists the ip-addresses that should be dropped. You could in principle create such a file in some place your apache can see it. This makes it a bit easier to handle unwanted "crawlers" as you just add the offending ips there.
HTH.
--
Kind regards,
Alexander Wagner
Deutsches Elektronen-Synchrotron DESY
Library and Documentation
Building 01d Room OG1.444
Notkestr. 85
22607 Hamburg
phone: +49-40-8998-1758
e-mail: alexander.wagner at desy.de
More information about the Koha
mailing list