Securing opac-search
Hello, Koha 24.11.01 Not strictly a Koha problem but something I know a lot of Koha users face. After years of running happily with fail2ban and robots.txt blocking bots/crawlers, the security seems to have passed. We've been getting more and more bots of late switching IPs before bans can take place, perhaps they could be ddos, either way grinding koha to a halt. I've had to switch OPACPublic to disable for now. I can't find much about securing a server against these types of hits. Does anyone else running a small server have any guidance on what could be done/the next steps? I'd ideally like to keep the OPAC public. Thank you Christina
Kia ora! Den 18.03.2025 13:59, skrev Fairlamb, Christina:
Hello,
Koha 24.11.01
Not strictly a Koha problem but something I know a lot of Koha users face. After years of running happily with fail2ban and robots.txt blocking bots/crawlers, the security seems to have passed. We've been getting more and more bots of late switching IPs before bans can take place, perhaps they could be ddos, either way grinding koha to a halt. I've had to switch OPACPublic to disable for now. I can't find much about securing a server against these types of hits. Does anyone else running a small server have any guidance on what could be done/the next steps? I'd ideally like to keep the OPAC public.
Not much help, but I know this will be a topic for discussion in Marseille, in a couple of weeks. Maybe some good advice can come from that. Best regards, Magnus
Here we're on 24.05, with no issue. But I use drastic measures, an array of them accumulated over years. First of all: there are many $$ services that do very well the job, and would make sense for individual entities. For service providers like us, that could become expensive. Many hacks with *MaxMindDB* to redirect all non-canadian traffic targeting our city (public) libraries. But for institutions (universities, hospitals) wanting to stay open to the world, I analyse all IPs in /var/log/apache2/other_vhosts_access.log and group the IPs by /16 and /24 to catch all the spreaders (1 call from each of 255 different IP for example) and block them automatically with *ufw*. And very important for a small company like us, not specialized in security: _I do not care about collateral damages_. If something needs to be unblocked, I create a new rule manually with ufw. Part proactive (allowing only CA, or redirecting automatically CN, RU, etc...), part reactive (waiting for enough calls to come in, and batch-blocking at midnight). Whatever get through doesn't impact performance, and that's all that matter to us in the end. Logo inLibro <https://inLibro.com> Philippe Blouin Directeur de la technologie T 833-INLIBRO (465-4276) <tel:833-465-4276>, poste 230 C philippe.blouin@inLibro.com www.inLibro.com <https://inLibro.com> On 2025-03-18 09:07, Magnus Enger wrote:
Kia ora!
Den 18.03.2025 13:59, skrev Fairlamb, Christina:
Hello,
Koha 24.11.01
Not strictly a Koha problem but something I know a lot of Koha users face. After years of running happily with fail2ban and robots.txt blocking bots/crawlers, the security seems to have passed. We've been getting more and more bots of late switching IPs before bans can take place, perhaps they could be ddos, either way grinding koha to a halt. I've had to switch OPACPublic to disable for now. I can't find much about securing a server against these types of hits. Does anyone else running a small server have any guidance on what could be done/the next steps? I'd ideally like to keep the OPAC public.
Not much help, but I know this will be a topic for discussion in Marseille, in a couple of weeks. Maybe some good advice can come from that.
Best regards, Magnus _______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Hi! On Tue, Mar 18, 2025 at 01:59:14PM +0100, Fairlamb, Christina wrote:
Not strictly a Koha problem but something I know a lot of Koha users face. After years of running happily with fail2ban and robots.txt blocking bots/crawlers, the security seems to have passed. We've been getting more and more bots of late switching IPs before bans can take place, perhaps they could be ddos, either way grinding koha to a halt. I've had to switch OPACPublic to disable for now. I can't find much about securing a server against these types of hits. Does anyone else running a small server have any guidance on what could be done/the next steps? I'd ideally like to keep the OPAC public.
Not only Koha is suffering from this (which seems to be cause mostly be "AI" bots). Here is a post from SourceHut: https://status.sr.ht/issues/2025-03-17-git.sr.ht-llms/ and a rather long thread on lobsters, which contains some solution ideas (and a lot of general LLM discussions): https://lobste.rs/s/dmuad3/mitigating_sourcehut_s_partial_outage But yes, currently it sucks... Greetings, domm -- Thomas Klausner domm https://domm.plix.at Just another ( Perl | Postgres | Koha | Bicycle | Food | Photo | Vinyl ) Hacker Today I added a special case for a "very special customer" to our Exporter. Already looking forward for the next special case... But at least I learned something about #Perl DBIX: You can do `{ select => 'id', { '' => "me.created between $begin and $until", -as => 'is_created'} }` to get `select me.id, me.created between $begin and $until as is_created`. Notice the empty string as the hash key, which should (according to the docs) be the name of the sql function you want to call, but luckily (or buggingly?) works to my advantage. [ 2025-03-17 20:25 > https://domm.plix.at/microblog.html ] Who came up with "Digital Natives" when it should have been "Digital Naives"? [ 2025-03-17 10:21 > https://domm.plix.at/microblog.html ]
Hi Christina You wrote:
Koha 24.11.01
Not strictly a Koha problem but something I know a lot of Koha users face. After years of running happily with fail2ban and robots.txt blocking bots/crawlers, the security seems to have passed. We've been getting more and more bots of late switching IPs before bans can take place, perhaps they could be ddos, either way grinding koha to a halt. I've had to switch OPACPublic to disable for now. I can't find much about securing a server against these types of hits. Does anyone else running a small server have any guidance on what could be done/the next steps? I'd ideally like to keep the OPAC public.
I recently opened a thread in the mailinglist "koha-devel" dealing with very similar behaviour which led to out of memory errors which caused Koha to exit: * https://lists.koha-community.org/pipermail/koha-devel/2025-March/048775.html The following article (provided by David Cook) gives some insight it what actually may be happening: * https://wiki.lyrasis.org/display/cmtygp/Aggressive+AI+Harvesting+of+Digital+... In my case regarding bots I don't rely on fail2ban and "robots.txt" anymore. There are way to many everchanging IP addresses while "robots.txt" just seems to get ignored. Instead what I did is the following: 1. In log file "/var/log/koha/<instancename>/plack.log" I investigated the user agent strings of suspicious bots. I did this for three libraries and I came up with the strings you'll find below. Of course there may be more such bots. Also it seems some bots have even more wicked ways to harrass the OPAC. 2. In configuration file "/etc/apache2/sites-available/<instancename>.conf" I added the following after the directive <VirtualHost *:443> which serves the Koha OPAC (these are three lines): RewriteEngine on RewriteCond %{HTTP_USER_AGENT} (ahrefs|Amazonbot|applebot|bingbot|CensysInspect|ChatGPT|ClaudeBot|Custom-AsyncHttpClient|DotBot|DuckDuckBot|Go-http-client|Googlebot|GoogleOther|GPTBot|l9explore|meta-externalagent|MJ12bot|MetaJobBot|OAI-SearchBot|Odin|PerplexityBot|PetalBot|Qwantbot|SemrushBot|Turnitin) [NC] RewriteRule ^(.*)$ - [F,L] After inserting these lines I restarted the Apache HTTP Server. 3. This is not a perfect solution (read the article I linked above) but at least the performance has gotten so much better by this immediately. And the bots identified by the given strings are definitely locked out. Hope this helps. Best wishes: Michael -- Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz T 0041 (0)61 261 55 61 · E mik@adminkuhn.ch · W www.adminkuhn.ch
participants (5)
-
Fairlamb, Christina -
Magnus Enger -
Michael Kuhn -
Philippe Blouin -
Thomas Klausner