Hi Christina You wrote:
Koha 24.11.01
Not strictly a Koha problem but something I know a lot of Koha users face. After years of running happily with fail2ban and robots.txt blocking bots/crawlers, the security seems to have passed. We've been getting more and more bots of late switching IPs before bans can take place, perhaps they could be ddos, either way grinding koha to a halt. I've had to switch OPACPublic to disable for now. I can't find much about securing a server against these types of hits. Does anyone else running a small server have any guidance on what could be done/the next steps? I'd ideally like to keep the OPAC public.
I recently opened a thread in the mailinglist "koha-devel" dealing with very similar behaviour which led to out of memory errors which caused Koha to exit: * https://lists.koha-community.org/pipermail/koha-devel/2025-March/048775.html The following article (provided by David Cook) gives some insight it what actually may be happening: * https://wiki.lyrasis.org/display/cmtygp/Aggressive+AI+Harvesting+of+Digital+... In my case regarding bots I don't rely on fail2ban and "robots.txt" anymore. There are way to many everchanging IP addresses while "robots.txt" just seems to get ignored. Instead what I did is the following: 1. In log file "/var/log/koha/<instancename>/plack.log" I investigated the user agent strings of suspicious bots. I did this for three libraries and I came up with the strings you'll find below. Of course there may be more such bots. Also it seems some bots have even more wicked ways to harrass the OPAC. 2. In configuration file "/etc/apache2/sites-available/<instancename>.conf" I added the following after the directive <VirtualHost *:443> which serves the Koha OPAC (these are three lines): RewriteEngine on RewriteCond %{HTTP_USER_AGENT} (ahrefs|Amazonbot|applebot|bingbot|CensysInspect|ChatGPT|ClaudeBot|Custom-AsyncHttpClient|DotBot|DuckDuckBot|Go-http-client|Googlebot|GoogleOther|GPTBot|l9explore|meta-externalagent|MJ12bot|MetaJobBot|OAI-SearchBot|Odin|PerplexityBot|PetalBot|Qwantbot|SemrushBot|Turnitin) [NC] RewriteRule ^(.*)$ - [F,L] After inserting these lines I restarted the Apache HTTP Server. 3. This is not a perfect solution (read the article I linked above) but at least the performance has gotten so much better by this immediately. And the bots identified by the given strings are definitely locked out. Hope this helps. Best wishes: Michael -- Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz T 0041 (0)61 261 55 61 · E mik@adminkuhn.ch · W www.adminkuhn.ch