[Koha] how to avoid high cpu uses due to web crawlers

Wagner, Alexander alexander.wagner at desy.de
Wed Jan 10 22:19:55 NZDT 2024


Hi!

> I found that an IP 47.76.35.19 is hitting my opac continuously, due to
> which CPU use is very high, and it makes the entire Koha opac and staff
> client very slow.

This does not look like a legit crawler. So most likely you can't tackle this guy with a robots.txt as most likely it will not respect it anyway.

> I am also not able to locate the file .htaccess in mu ubuntu 18.04 with
> koha 20.04
> Can anyone how to resolve this?

`.htaccess` files do not exist by default, you'd have to create it in the appropriate place with proper permissions and ownerships using your favourite text-editor. They are basically folder based firewall rules read by your webserver. IOW you could either use those or have a rule in your apache configs.

I am no expert in either but on one of our current (non-koha)-systems we use something like

```

# Turn badips away
RewriteMap hosts-deny "txt:/opt/invenio/var/tmp/hosts-deny.txt"
RewriteCond   "${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}" "!=NOT-FOUND" [OR]
RewriteCond   "${hosts-deny:%{HTTP:X-Forwarded-For}|NOT-FOUND}" "!=NOT-FOUND"
RewriteRule .* - [R=429,L]

```

in the apache configs. This refers to a txt-file in this case in some funny path `/opt/invenio/var/tmp/` called `hosts-deny.txt` that lists the ip-addresses that should be dropped. You could in principle create such a file in some place your apache can see it. This makes it a bit easier to handle unwanted "crawlers" as you just add the offending ips there.

HTH.

-- 
Kind regards,

Alexander Wagner

Deutsches Elektronen-Synchrotron DESY
Library and Documentation

Building 01d Room OG1.444
Notkestr. 85
22607 Hamburg

phone:  +49-40-8998-1758
e-mail: alexander.wagner at desy.de


More information about the Koha mailing list