[Koha] Securing opac-search

Michael Kuhn mik at adminkuhn.ch
Wed Mar 19 03:10:33 NZDT 2025


Hi Christina

You wrote:

 > Koha 24.11.01
 >
 > Not strictly a Koha problem but something I know a lot of Koha users
 > face. After years of running happily with fail2ban and robots.txt
 > blocking bots/crawlers, the security seems to have passed. We've been
 > getting more and more bots of late switching IPs before bans can take
 > place, perhaps they could be ddos, either way grinding koha to a halt.
 > I've had to switch OPACPublic to disable for now. I can't find much
 > about securing a server against these types of hits. Does anyone else
 > running a small server have any guidance on what could be done/the
 > next steps? I'd ideally like to keep the OPAC public.

I recently opened a thread in the mailinglist "koha-devel" dealing with 
very similar behaviour which led to out of memory errors which caused 
Koha to exit:

* 
https://lists.koha-community.org/pipermail/koha-devel/2025-March/048775.html

The following article (provided by David Cook) gives some insight it 
what actually may be happening:

* 
https://wiki.lyrasis.org/display/cmtygp/Aggressive+AI+Harvesting+of+Digital+Resources

In my case regarding bots I don't rely on fail2ban and "robots.txt" 
anymore. There are way to many everchanging IP addresses while 
"robots.txt" just seems to get ignored.

Instead what I did is the following:

1. In log file "/var/log/koha/<instancename>/plack.log" I investigated 
the user agent strings of suspicious bots. I did this for three 
libraries and I came up with the strings you'll find below.

Of course there may be more such bots. Also it seems some bots have even 
more wicked ways to harrass the OPAC.

2. In configuration file 
"/etc/apache2/sites-available/<instancename>.conf" I added the following 
after the directive <VirtualHost *:443> which serves the Koha OPAC 
(these are three lines):

RewriteEngine on

RewriteCond %{HTTP_USER_AGENT} 
(ahrefs|Amazonbot|applebot|bingbot|CensysInspect|ChatGPT|ClaudeBot|Custom-AsyncHttpClient|DotBot|DuckDuckBot|Go-http-client|Googlebot|GoogleOther|GPTBot|l9explore|meta-externalagent|MJ12bot|MetaJobBot|OAI-SearchBot|Odin|PerplexityBot|PetalBot|Qwantbot|SemrushBot|Turnitin) 
[NC]

RewriteRule ^(.*)$ - [F,L]

After inserting these lines I restarted the Apache HTTP Server.

3. This is not a perfect solution (read the article I linked above) but 
at least the performance has gotten so much better by this immediately. 
And the bots identified by the given strings are definitely locked out.

Hope this helps.

Best wishes: Michael
-- 
Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis
Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz
T 0041 (0)61 261 55 61 · E mik at adminkuhn.ch · W www.adminkuhn.ch


More information about the Koha mailing list