[Koha] Koha Digest, Vol 237, Issue 7 - AI bot scrapers on library sites

Dwight Walker dwight at wwwalker.com.au
Wed Jul 23 13:52:54 NZST 2025


I found this article on how many libraries worldwide are being attacked by AI bot scrapers which are badly written and make many connections instead of one.

http://www.libraryjournal.com/story/news/ai-bots-swarm-library-cultural-heritage-sites-causing-slowdowns-and-crashes

July 12, 2025 10:00 AM, koha-request at lists.katipo.co.nz wrote:

> Send Koha mailing list submissions to
> koha at lists.katipo.co.nz
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.katipo.co.nz/mailman/listinfo/koha
> or, via email, send a message with subject or body 'help' to
> koha-request at lists.katipo.co.nz
> 
> You can reach the person managing the list at
> koha-owner at lists.katipo.co.nz
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Koha digest..."
> 
> Today's Topics:
> 
> 1. Re: Missing shelf browser (Elaine Bradtke)
> 2. Re: Slowness & outages (Stowasser Rainer)
> 3. Re: Slowness & outages (Michael Kuhn)
> 4. Re: Slowness & outages (Thomas Klausner)
> 5. Re: Slowness & outages (Mark Alexander)
> 6. Re: Slowness & outages (Mark Alexander)
> 7. Re: Slowness & outages (Jason Boyer)
> 8. Re: Slowness & outages (Mark Alexander)
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 10 Jul 2025 20:46:51 -0700
> From: Elaine Bradtke <eb at efdss.org>
> To: koha <koha at lists.katipo.co.nz>
> Subject: Re: [Koha] Missing shelf browser
> Message-ID:
> <CAPdfUuw16QHchDeQiZrb_jTup9LL12GSP9NWzqgnHvjbTKpgRQ at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
> 
> To clarify, the buttons to go back to the list or move to the next or
> previous record are not displayed.
> 
> Elaine
> VWML <https://www.efdss.org/vwml>
> 
> On Thu, Jul 10, 2025 at 4:16 PM Elaine Bradtke <eb at efdss.org> wrote:
> 
>> We've just upgraded to 25.05.01. We use zebra searching.
>> The browse shelf feature has vanished. We're also not able to go back to
>> the list of search results.
>> I didn't see any applicable preferences that needed to be changed.
>> Has anyone else seen this? Any ideas what might be the problem?
>> Thanks
>> 
>> Elaine Bradtke
>> VWML <https://www.efdss.org/vwml>
>> English Folk Dance and Song Society <https://www.efdss.org>
>> Cecil Sharp House, 2 Regent's Park Road, London NW1 7AY
>> Tel +44 (0) 20 7485 2206 (This number is for the English Folk Dance and
>> Song Society in London, England. If you wish to phone me personally, send
>> an e-mail first. I work off site)
>> --------------------------------------------------------------------------
>> Registered Company No. 297142
>> Charity Registered in England and Wales No. 305999
> 
> ------------------------------
> 
> Message: 2
> Date: Fri, 11 Jul 2025 05:31:17 +0000
> From: Stowasser Rainer <Rainer.Stowasser at geosphere.at>
> To: Mark Alexander <marka at pobox.com>, Koha <Koha at lists.katipo.co.nz>
> Subject: Re: [Koha] Slowness & outages
> Message-ID: <fee81ed2aae24d3b95d5bbd1b68ace84 at geosphere.at>
> Content-Type: text/plain; charset="Windows-1252"
> 
> we had the same Problem
> 
> so our support firm hks3 installed
> 
> https://anubis.techaro.lol
> 
> it works fine.
> 
> Kind regards
> Hofrat Mag. Rainer Stowasser
> Geosphere Austria IKS-Services
> Vice Head Library, Publisher, Archive
> branch manager Hohe Warte
> 
> Hohe Warte 38, 1190 Vienna
> T. +43 1 360 26 2006
> rainer.stowasser at geosphere.at | www.geosphere.at
> 
> GeoSphere Austria – Bundesanstalt für Geologie, Geophysik, Klimatologie und Meteorologie | Anstalt
> öffentlichen Rechts
> Firmensitz: Hohe Warte 38, 1190 Wien | Firmenbuchnummer: 584036 b | Firmenbuchgericht:
> Handelsgericht Wien
> 
> ________________________________________
> Von: Koha <koha-bounces at lists.katipo.co.nz> im Auftrag von Mark Alexander <marka at pobox.com>
> Gesendet: Donnerstag, 10. Juli 2025 23:34:26
> An: Koha
> Betreff: Re: [Koha] Slowness & outages
> 
> It looks I spoke too soon about my use of iptables to block out of
> control web crawlers. Our Koha installation is now being attacked by
> crawlers, and there are so many that using iptables isn't practical.
> 
> Examining /var/log/apache2/other_vhosts_access.log shows that these
> crawlers don't use any identification that can be used by fail2ban.
> Here are a couple of them (with the name of our library changed, and URLs
> shorted):
> 
> koha.example.com:443 14.248.94.197 - - [10/Jul/2025:17:19:11 -0400] "GET
> /cgi-bin/koha/opac-search.pl?... HTTP/1.1" 200 15946 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X
> 10_11_1 rv:4.0; bem-ZM) AppleWebKit/535.45.1 (KHTML, like Gecko) Version/4.0.2 Safari/535.45.1"
> koha.example.com:443 200.71.98.253 - - [10/Jul/2025:17:19:11 -0400] "GET
> /cgi-bin/koha/opac-search.pl?... HTTP/1.1" 200 15960 "-" "Mozilla/5.0 (compatible; MSIE 9.0;
> Windows CE; Trident/4.0)"
> 
> Running a grep|sed|sort|uniq filter on the log show that we're being
> attacked by almost 1000 crawlers today.
> 
> I've tried adding these lines to /etc/apache2/apache2.conf:
> 
> <IfModule mpm_worker_module>
> MaxRequestWorkers 5
> </IfModule>
> 
> But the attacks still keep both CPUs busy; top reports them as
> follows:
> 
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 10319 rpl-koha 20 0 288340 234040 20880 R 80.1 5.9 0:04.14 /usr/share/koha
> 10085 rpl-koha 20 0 0 0 0 R 68.1 0.0 0:17.03 starman worker
> 
> I'm not sure what to do next. I had thought of using the apache2
> authz_core module to restrict Koha to a handful of IP addresses, such
> as those used by computers at the library. But this would prevent
> patrons from accessing the OPAC from home. I'm pretty desperate now.
> Suggestions welcome.
> 
> This is on Linode, in case that makes a difference.
> 
> --
> I'm doing my part to help preserve life on earth
> by trying to preserve my own. --Ashleigh Brilliant
> 
> _______________________________________________
> 
> Koha mailing list http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know
> the content is safe.
> 
> ------------------------------
> 
> Message: 3
> Date: Fri, 11 Jul 2025 09:21:27 +0200
> From: Michael Kuhn <mik at adminkuhn.ch>
> To: koha at lists.katipo.co.nz
> Subject: Re: [Koha] Slowness & outages
> Message-ID: <0e7a0155-56bd-45d3-9c4c-c409ab5e4ba8 at adminkuhn.ch>
> Content-Type: text/plain; charset=UTF-8; format=flowed
> 
> Hi
> 
> There is some documentation how to implement Anubis when running Koha:
> 
> https://www.koha-support.eu/using-anubis-with-koha
> 
> I tried it on my Koha demo installation ( https://koha.adminkuhn.ch )
> and as far as I can say it's the best approach.
> 
> Best wishes: Michael
> --
> Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis
> Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz
> T 0041 (0)61 261 55 61 · E mik at adminkuhn.ch · W www.adminkuhn.ch
> 
> Am 11.07.25 um 07:31 schrieb Stowasser Rainer:
> 
>> we had the same Problem
>> 
>> so our support firm hks3 installed
>> 
>> https://anubis.techaro.lol
>> 
>> it works fine.
>> 
>> Kind regards
>> Hofrat Mag. Rainer Stowasser
>> Geosphere Austria IKS-Services
>> Vice Head Library, Publisher, Archive
>> branch manager Hohe Warte
>> 
>> Hohe Warte 38, 1190 Vienna
>> T. +43 1 360 26 2006
>> rainer.stowasser at geosphere.at | www.geosphere.at
>> 
>> GeoSphere Austria – Bundesanstalt für Geologie, Geophysik, Klimatologie und Meteorologie | Anstalt
>> öffentlichen Rechts
>> Firmensitz: Hohe Warte 38, 1190 Wien | Firmenbuchnummer: 584036 b | Firmenbuchgericht:
>> Handelsgericht Wien
>> 
>> ________________________________________
>> Von: Koha <koha-bounces at lists.katipo.co.nz> im Auftrag von Mark Alexander <marka at pobox.com>
>> Gesendet: Donnerstag, 10. Juli 2025 23:34:26
>> An: Koha
>> Betreff: Re: [Koha] Slowness & outages
>> 
>> It looks I spoke too soon about my use of iptables to block out of
>> control web crawlers. Our Koha installation is now being attacked by
>> crawlers, and there are so many that using iptables isn't practical.
>> 
>> Examining /var/log/apache2/other_vhosts_access.log shows that these
>> crawlers don't use any identification that can be used by fail2ban.
>> Here are a couple of them (with the name of our library changed, and URLs
>> shorted):
>> 
>> koha.example.com:443 14.248.94.197 - - [10/Jul/2025:17:19:11 -0400] "GET
>> /cgi-bin/koha/opac-search.pl?... HTTP/1.1" 200 15946 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X
>> 10_11_1 rv:4.0; bem-ZM) AppleWebKit/535.45.1 (KHTML, like Gecko) Version/4.0.2 Safari/535.45.1"
>> koha.example.com:443 200.71.98.253 - - [10/Jul/2025:17:19:11 -0400] "GET
>> /cgi-bin/koha/opac-search.pl?... HTTP/1.1" 200 15960 "-" "Mozilla/5.0 (compatible; MSIE 9.0;
>> Windows CE; Trident/4.0)"
>> 
>> Running a grep|sed|sort|uniq filter on the log show that we're being
>> attacked by almost 1000 crawlers today.
>> 
>> I've tried adding these lines to /etc/apache2/apache2.conf:
>> 
>> <IfModule mpm_worker_module>
>> MaxRequestWorkers 5
>> </IfModule>
>> 
>> But the attacks still keep both CPUs busy; top reports them as
>> follows:
>> 
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 10319 rpl-koha 20 0 288340 234040 20880 R 80.1 5.9 0:04.14 /usr/share/koha
>> 10085 rpl-koha 20 0 0 0 0 R 68.1 0.0 0:17.03 starman worker
>> 
>> I'm not sure what to do next. I had thought of using the apache2
>> authz_core module to restrict Koha to a handful of IP addresses, such
>> as those used by computers at the library. But this would prevent
>> patrons from accessing the OPAC from home. I'm pretty desperate now.
>> Suggestions welcome.
>> 
>> This is on Linode, in case that makes a difference.
>> 
>> --
>> I'm doing my part to help preserve life on earth
>> by trying to preserve my own. --Ashleigh Brilliant
>> 
>> _______________________________________________
>> 
>> Koha mailing list http://koha-community.org
>> Koha at lists.katipo.co.nz
>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>> EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know
>> the content is safe.
>> 
>> _______________________________________________
>> 
>> Koha mailing list http://koha-community.org
>> Koha at lists.katipo.co.nz
>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> 
> --
> Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis
> Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz
> T 0041 (0)61 261 55 61 · E mik at adminkuhn.ch · W www.adminkuhn.ch
> 
> ------------------------------
> 
> Message: 4
> Date: Fri, 11 Jul 2025 09:24:22 +0200
> From: Thomas Klausner <domm at plix.at>
> To: koha at lists.katipo.co.nz
> Subject: Re: [Koha] Slowness & outages
> Message-ID: <aHC8JhVmBCNH_1ni at f13.plix.at>
> Content-Type: text/plain; charset=us-ascii
> 
> Hi!
> 
> On Fri, Jul 11, 2025 at 05:31:17AM +0000, Stowasser Rainer wrote:
> 
>> so our support firm hks3 installed
>> 
>> https://anubis.techaro.lol
>> 
>> it works fine.
> 
> See a bit more info in our blog post here: https://www.koha-support.eu/using-anubis-with-koha
> 
> OpenFifth also published a nice post, with a bit more background info:
> https://openfifth.co.uk/fighting-the-ai-bots
> 
> Greetings,
> domm (HKS3)
> 
> --
> Thomas Klausner domm https://domm.plix.at
> Just another ( Perl | Postgres | Koha | Bicycle | Food | Photo | Vinyl ) Hacker
> 
> Today I managed to write a #Postgres query using 2 CTEs and a window function to delete some
> duplicate values (which should have been prevented by a proper unique constraint..) with only a
> tiny peek into the docs (I forgot the name of the `row_number()` function).
> [ 2025-07-10 19:40 > https://domm.plix.at/microblog.html ]
> 
> Today #mst passed away. I really enjoyed meeting him at conferences, listeing to his weird &
> wonderful (and thus very Perlish) talks and chatting with him during the social track. I guess I
> use some of his code every day. :-( #Perl
> https://www.shadowcat.co.uk/2025/07/09/ripples-they-cause-in-the-world
> [ 2025-07-09 19:32 > https://domm.plix.at/microblog.html ]
> 
> ------------------------------
> 
> Message: 5
> Date: Fri, 11 Jul 2025 07:45:36 -0400
> From: Mark Alexander <marka at pobox.com>
> To: Stowasser Rainer <Rainer.Stowasser at geosphere.at>
> Cc: Koha <Koha at lists.katipo.co.nz>
> Subject: Re: [Koha] Slowness & outages
> Message-ID: <1752234305-csup-4163 at bionic.bloovis.com>
> Content-Type: text/plain; charset=UTF-8;
> 
> Excerpts from Stowasser Rainer's message of 2025-07-11 05:31:17 UTC:
>> https://anubis.techaro.lol
> 
> Thank you! I will look into this now.
> 
> --
> I'm trying to live my life -- a task so difficult it
> has never been attempted before. --Ashleigh Brilliant
> 
> ------------------------------
> 
> Message: 6
> Date: Fri, 11 Jul 2025 08:22:17 -0400
> From: Mark Alexander <marka at pobox.com>
> To: Thomas Klausner <domm at plix.at>
> Cc: koha <koha at lists.katipo.co.nz>
> Subject: Re: [Koha] Slowness & outages
> Message-ID: <1752235419-csup-4685 at bionic.bloovis.com>
> Content-Type: text/plain; charset=UTF-8;
> 
> Excerpts from Thomas Klausner's message of 2025-07-11 07:24:22 UTC:
>> See a bit more info in our blog post here: https://www.koha-support.eu/using-anubis-with-koha
> 
> This is helpful, but I'm puzzled by this line from the SSL configuration example:
> 
> AssignUserID <
> 
> This looks like it's missing something. I'm assuming it should be the
> same as the equivalent line from the localhost:80 configuration
> example:
> 
> AssignUserID <instancename>-koha <instancename>-koha
> 
> --
> My object is to save the world, while still leading
> a pleasant life. --Ashleigh Brilliant
> 
> ------------------------------
> 
> Message: 7
> Date: Fri, 11 Jul 2025 11:09:30 -0400
> From: Jason Boyer <JBoyer at equinoxoli.org>
> To: Mark Alexander <marka at pobox.com>
> Cc: koha <koha at lists.katipo.co.nz>
> Subject: Re: [Koha] Slowness & outages
> Message-ID:
> <CANhZHJPi3EmYRa36HwbRJkad8=pZETPdt4vsLP2KsuMpCXYz9w at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
> 
> Hi Mark, while I would encourage you to keep looking into Anubis something
> you can do to buy yourself some more time is to block user agents that are
> almost certainly fake. One of your examples shows what claims to be MSIE 9
> running on Windows CE. Actual patrons are not running anything with Windows
> 9X, NT 3/4/5, or CE, all of which you're likely to find single requests per
> IP in your logs (and mostly wouldn't even be able connect to modern TLS).
> Also blocking anything claiming to be Chrome or Firefox with a version less
> than 100 (several years old by now) will make a huge difference without
> actually impacting real users. Though that one may need to be narrowed down
> to just Windows and macOS if you have some locations using old Linux
> machines (usually Raspberry Pi's) running older versions of Chrome, because
> it can't keep itself up to date as easily on Linux.
> Good luck!
> 
> Jason
> 
> --
> Jason Boyer
> Senior System Administrator
> Equinox Open Library Initiative
> JBoyer at equinoxOLI.org
> +1 (877) Open-ILS (673-6457)
> https://equinoxOLI.org
> 
> On Fri, Jul 11, 2025 at 8:22 AM Mark Alexander <marka at pobox.com> wrote:
> 
>> Excerpts from Thomas Klausner's message of 2025-07-11 07:24:22 UTC:
>> See a bit more info in our blog post here:
>> https://www.koha-support.eu/using-anubis-with-koha
>> 
>> This is helpful, but I'm puzzled by this line from the SSL configuration
>> example:
>> 
>> AssignUserID <
>> 
>> This looks like it's missing something. I'm assuming it should be the
>> same as the equivalent line from the localhost:80 configuration
>> example:
>> 
>> AssignUserID <instancename>-koha <instancename>-koha
>> 
>> --
>> My object is to save the world, while still leading
>> a pleasant life. --Ashleigh Brilliant
>> 
>> _______________________________________________
>> 
>> Koha mailing list http://koha-community.org
>> Koha at lists.katipo.co.nz
>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> 
> ------------------------------
> 
> Message: 8
> Date: Fri, 11 Jul 2025 13:33:09 -0400
> From: Mark Alexander <marka at pobox.com>
> To: koha <koha at lists.katipo.co.nz>
> Subject: Re: [Koha] Slowness & outages
> Message-ID: <1752254994-csup-1824 at bionic.bloovis.com>
> Content-Type: text/plain; charset=UTF-8;
> 
> Excerpts from Jason Boyer's message of 2025-07-11 15:09:30 UTC:
>> Hi Mark, while I would encourage you to keep looking into Anubis something
>> you can do to buy yourself some more time is to block user agents that are
>> almost certainly fake.
> 
> I've implemented Anubis on our Koha installation, and it seems to be working.
> 
> --
> Try to have as good a life as you can
> under the circumstances. --Ashleigh Brilliant
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> Koha mailing list
> Koha at lists.katipo.co.nz
> https://lists.katipo.co.nz/mailman/listinfo/koha
> 
> ------------------------------
> 
> End of Koha Digest, Vol 237, Issue 7
> ************************************


Dwight Walker
WWWalker Web Development Pty Ltd
https://wwwalker.com.au


More information about the Koha mailing list