Dear Colleagues, My library consortium is hosted by ByWater Solutions and during the last seven days, several times Koha has been slow and then there's an outage for thirty to sixty minutes then things are fine for a while, then there's more slowness and outage. ByWater is addressing these issues, but I want to know if anyone else's sites or clients are having issues like these? I am also trying to get educated on modern internet threats and causes for issues like this. ByWater told us that the cause of these issues are the many, many AI bots that are crawling our site and ByWater has put our site behind CloudFlare for protection. I also know that at https://status.bywatersolutions.com, ByWater mentioned that their upstream provider have had issues with SSL errors and major data packet loss. Thank you, Christopher Davis Technical Services Specialist Lincoln County Library District (541) 264-8141 <tel:5412648141> christopher.davis@lincolncolibrarydist.org www.lincolncolibrarydist.org <https://www.lincolncolibrarydist.org> Click to view my schedule or book an appointment w/ me <https://calendar.app.google/6Ymx4UrHiTfR6faaA>
We’re also having similar issues off & on at Arcadia. Thanks for the info about BW’s upstream provider; I hadn’t checked that yet. Jon Drucker Assistant Professor, Landman Library Collections and Technical Services Librarian Arcadia University | druckerj@arcadia.edu On Fri, Jun 20, 2025 at 7:35 PM Chris Davis < christopher.davis@lincolncolibrarydist.org> wrote:
Dear Colleagues,
My library consortium is hosted by ByWater Solutions and during the last seven days, several times Koha has been slow and then there's an outage for thirty to sixty minutes then things are fine for a while, then there's more slowness and outage. ByWater is addressing these issues, but I want to know if anyone else's sites or clients are having issues like these?
I am also trying to get educated on modern internet threats and causes for issues like this. ByWater told us that the cause of these issues are the many, many AI bots that are crawling our site and ByWater has put our site behind CloudFlare for protection. I also know that at https://status.bywatersolutions.com, ByWater mentioned that their upstream provider have had issues with SSL errors and major data packet loss.
Thank you,
Christopher Davis Technical Services Specialist Lincoln County Library District (541) 264-8141 <tel:5412648141> christopher.davis@lincolncolibrarydist.org www.lincolncolibrarydist.org <https://www.lincolncolibrarydist.org> Click to view my schedule or book an appointment w/ me <https://calendar.app.google/6Ymx4UrHiTfR6faaA>
_______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
One of the Koha installations I manage has run into the slowdown problem a couple of times. This is on a Linode VPS, not using CloudFlare. In each case I logged into the server and used top to see that Koha was the process taking up all the CPU. Then I looked at this Apache log: /var/log/apache2/other_vhosts_access.log This showed me that it was web crawlers out of control and ignoring robots.txt. Using the IP addresses from the logs, I blocked them using iptables, e.g.: iptables -A INPUT -s 47.76.209.138 -j DROP I realize this is not a permanent solution, but it worked for us because we don't run into this problem very often. I'd have to come up with something better if this were a frequent occurrence. -- What should I do if my problems aren't all solved before I die? --Ashleigh Brilliant
It looks I spoke too soon about my use of iptables to block out of control web crawlers. Our Koha installation is now being attacked by crawlers, and there are so many that using iptables isn't practical. Examining /var/log/apache2/other_vhosts_access.log shows that these crawlers don't use any identification that can be used by fail2ban. Here are a couple of them (with the name of our library changed, and URLs shorted): koha.example.com:443 14.248.94.197 - - [10/Jul/2025:17:19:11 -0400] "GET /cgi-bin/koha/opac-search.pl?... HTTP/1.1" 200 15946 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1 rv:4.0; bem-ZM) AppleWebKit/535.45.1 (KHTML, like Gecko) Version/4.0.2 Safari/535.45.1" koha.example.com:443 200.71.98.253 - - [10/Jul/2025:17:19:11 -0400] "GET /cgi-bin/koha/opac-search.pl?... HTTP/1.1" 200 15960 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows CE; Trident/4.0)" Running a grep|sed|sort|uniq filter on the log show that we're being attacked by almost 1000 crawlers today. I've tried adding these lines to /etc/apache2/apache2.conf: <IfModule mpm_worker_module> MaxRequestWorkers 5 </IfModule> But the attacks still keep both CPUs busy; top reports them as follows: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10319 rpl-koha 20 0 288340 234040 20880 R 80.1 5.9 0:04.14 /usr/share/koha 10085 rpl-koha 20 0 0 0 0 R 68.1 0.0 0:17.03 starman worker I'm not sure what to do next. I had thought of using the apache2 authz_core module to restrict Koha to a handful of IP addresses, such as those used by computers at the library. But this would prevent patrons from accessing the OPAC from home. I'm pretty desperate now. Suggestions welcome. This is on Linode, in case that makes a difference. -- I'm doing my part to help preserve life on earth by trying to preserve my own. --Ashleigh Brilliant
we had the same Problem so our support firm hks3 installed https://anubis.techaro.lol/ it works fine. Kind regards Hofrat Mag. Rainer Stowasser Geosphere Austria IKS-Services Vice Head Library, Publisher, Archive branch manager Hohe Warte Hohe Warte 38, 1190 Vienna T. +43 1 360 26 2006 rainer.stowasser@geosphere.at | www.geosphere.at GeoSphere Austria – Bundesanstalt für Geologie, Geophysik, Klimatologie und Meteorologie | Anstalt öffentlichen Rechts Firmensitz: Hohe Warte 38, 1190 Wien | Firmenbuchnummer: 584036 b | Firmenbuchgericht: Handelsgericht Wien ________________________________________ Von: Koha <koha-bounces@lists.katipo.co.nz> im Auftrag von Mark Alexander <marka@pobox.com> Gesendet: Donnerstag, 10. Juli 2025 23:34:26 An: Koha Betreff: Re: [Koha] Slowness & outages It looks I spoke too soon about my use of iptables to block out of control web crawlers. Our Koha installation is now being attacked by crawlers, and there are so many that using iptables isn't practical. Examining /var/log/apache2/other_vhosts_access.log shows that these crawlers don't use any identification that can be used by fail2ban. Here are a couple of them (with the name of our library changed, and URLs shorted): koha.example.com:443 14.248.94.197 - - [10/Jul/2025:17:19:11 -0400] "GET /cgi-bin/koha/opac-search.pl?... HTTP/1.1" 200 15946 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1 rv:4.0; bem-ZM) AppleWebKit/535.45.1 (KHTML, like Gecko) Version/4.0.2 Safari/535.45.1" koha.example.com:443 200.71.98.253 - - [10/Jul/2025:17:19:11 -0400] "GET /cgi-bin/koha/opac-search.pl?... HTTP/1.1" 200 15960 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows CE; Trident/4.0)" Running a grep|sed|sort|uniq filter on the log show that we're being attacked by almost 1000 crawlers today. I've tried adding these lines to /etc/apache2/apache2.conf: <IfModule mpm_worker_module> MaxRequestWorkers 5 </IfModule> But the attacks still keep both CPUs busy; top reports them as follows: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10319 rpl-koha 20 0 288340 234040 20880 R 80.1 5.9 0:04.14 /usr/share/koha 10085 rpl-koha 20 0 0 0 0 R 68.1 0.0 0:17.03 starman worker I'm not sure what to do next. I had thought of using the apache2 authz_core module to restrict Koha to a handful of IP addresses, such as those used by computers at the library. But this would prevent patrons from accessing the OPAC from home. I'm pretty desperate now. Suggestions welcome. This is on Linode, in case that makes a difference. -- I'm doing my part to help preserve life on earth by trying to preserve my own. --Ashleigh Brilliant _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe.
Hi There is some documentation how to implement Anubis when running Koha: https://www.koha-support.eu/using-anubis-with-koha/ I tried it on my Koha demo installation ( https://koha.adminkuhn.ch/ ) and as far as I can say it's the best approach. Best wishes: Michael -- Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz T 0041 (0)61 261 55 61 · E mik@adminkuhn.ch · W www.adminkuhn.ch Am 11.07.25 um 07:31 schrieb Stowasser Rainer:
we had the same Problem
so our support firm hks3 installed
it works fine.
Kind regards Hofrat Mag. Rainer Stowasser Geosphere Austria IKS-Services Vice Head Library, Publisher, Archive branch manager Hohe Warte
Hohe Warte 38, 1190 Vienna T. +43 1 360 26 2006 rainer.stowasser@geosphere.at | www.geosphere.at
GeoSphere Austria – Bundesanstalt für Geologie, Geophysik, Klimatologie und Meteorologie | Anstalt öffentlichen Rechts Firmensitz: Hohe Warte 38, 1190 Wien | Firmenbuchnummer: 584036 b | Firmenbuchgericht: Handelsgericht Wien
________________________________________ Von: Koha <koha-bounces@lists.katipo.co.nz> im Auftrag von Mark Alexander <marka@pobox.com> Gesendet: Donnerstag, 10. Juli 2025 23:34:26 An: Koha Betreff: Re: [Koha] Slowness & outages
It looks I spoke too soon about my use of iptables to block out of control web crawlers. Our Koha installation is now being attacked by crawlers, and there are so many that using iptables isn't practical.
Examining /var/log/apache2/other_vhosts_access.log shows that these crawlers don't use any identification that can be used by fail2ban. Here are a couple of them (with the name of our library changed, and URLs shorted):
koha.example.com:443 14.248.94.197 - - [10/Jul/2025:17:19:11 -0400] "GET /cgi-bin/koha/opac-search.pl?... HTTP/1.1" 200 15946 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1 rv:4.0; bem-ZM) AppleWebKit/535.45.1 (KHTML, like Gecko) Version/4.0.2 Safari/535.45.1" koha.example.com:443 200.71.98.253 - - [10/Jul/2025:17:19:11 -0400] "GET /cgi-bin/koha/opac-search.pl?... HTTP/1.1" 200 15960 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows CE; Trident/4.0)"
Running a grep|sed|sort|uniq filter on the log show that we're being attacked by almost 1000 crawlers today.
I've tried adding these lines to /etc/apache2/apache2.conf:
<IfModule mpm_worker_module> MaxRequestWorkers 5 </IfModule>
But the attacks still keep both CPUs busy; top reports them as follows:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10319 rpl-koha 20 0 288340 234040 20880 R 80.1 5.9 0:04.14 /usr/share/koha 10085 rpl-koha 20 0 0 0 0 R 68.1 0.0 0:17.03 starman worker
I'm not sure what to do next. I had thought of using the apache2 authz_core module to restrict Koha to a handful of IP addresses, such as those used by computers at the library. But this would prevent patrons from accessing the OPAC from home. I'm pretty desperate now. Suggestions welcome.
This is on Linode, in case that makes a difference.
-- I'm doing my part to help preserve life on earth by trying to preserve my own. --Ashleigh Brilliant
_______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe.
_______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
-- Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz T 0041 (0)61 261 55 61 · E mik@adminkuhn.ch · W www.adminkuhn.ch
Hi! On Fri, Jul 11, 2025 at 05:31:17AM +0000, Stowasser Rainer wrote:
so our support firm hks3 installed
it works fine.
See a bit more info in our blog post here: https://www.koha-support.eu/using-anubis-with-koha/ OpenFifth also published a nice post, with a bit more background info: https://openfifth.co.uk/fighting-the-ai-bots/ Greetings, domm (HKS3) -- Thomas Klausner domm https://domm.plix.at Just another ( Perl | Postgres | Koha | Bicycle | Food | Photo | Vinyl ) Hacker Today I managed to write a #Postgres query using 2 CTEs and a window function to delete some duplicate values (which should have been prevented by a proper unique constraint..) with only a tiny peek into the docs (I forgot the name of the `row_number()` function). [ 2025-07-10 19:40 > https://domm.plix.at/microblog.html ] Today #mst passed away. I really enjoyed meeting him at conferences, listeing to his weird & wonderful (and thus very Perlish) talks and chatting with him during the social track. I guess I use some of his code every day. :-( #Perl https://www.shadowcat.co.uk/2025/07/09/ripples-they-cause-in-the-world/ [ 2025-07-09 19:32 > https://domm.plix.at/microblog.html ]
Excerpts from Thomas Klausner's message of 2025-07-11 07:24:22 UTC:
See a bit more info in our blog post here: https://www.koha-support.eu/using-anubis-with-koha/
This is helpful, but I'm puzzled by this line from the SSL configuration example: AssignUserID < This looks like it's missing something. I'm assuming it should be the same as the equivalent line from the localhost:80 configuration example: AssignUserID <instancename>-koha <instancename>-koha -- My object is to save the world, while still leading a pleasant life. --Ashleigh Brilliant
Hi Mark, while I would encourage you to keep looking into Anubis something you can do to buy yourself some more time is to block user agents that are almost certainly fake. One of your examples shows what claims to be MSIE 9 running on Windows CE. Actual patrons are not running anything with Windows 9X, NT 3/4/5, or CE, all of which you're likely to find single requests per IP in your logs (and mostly wouldn't even be able connect to modern TLS). Also blocking anything claiming to be Chrome or Firefox with a version less than 100 (several years old by now) will make a huge difference without actually impacting real users. Though that one may need to be narrowed down to just Windows and macOS if you have some locations using old Linux machines (usually Raspberry Pi's) running older versions of Chrome, because it can't keep itself up to date as easily on Linux. Good luck! Jason -- Jason Boyer Senior System Administrator Equinox Open Library Initiative JBoyer@equinoxOLI.org +1 (877) Open-ILS (673-6457) https://equinoxOLI.org/ On Fri, Jul 11, 2025 at 8:22 AM Mark Alexander <marka@pobox.com> wrote:
Excerpts from Thomas Klausner's message of 2025-07-11 07:24:22 UTC:
See a bit more info in our blog post here: https://www.koha-support.eu/using-anubis-with-koha/
This is helpful, but I'm puzzled by this line from the SSL configuration example:
AssignUserID <
This looks like it's missing something. I'm assuming it should be the same as the equivalent line from the localhost:80 configuration example:
AssignUserID <instancename>-koha <instancename>-koha
-- My object is to save the world, while still leading a pleasant life. --Ashleigh Brilliant
_______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Excerpts from Jason Boyer's message of 2025-07-11 15:09:30 UTC:
Hi Mark, while I would encourage you to keep looking into Anubis something you can do to buy yourself some more time is to block user agents that are almost certainly fake.
I've implemented Anubis on our Koha installation, and it seems to be working. -- Try to have as good a life as you can under the circumstances. --Ashleigh Brilliant
Excerpts from Stowasser Rainer's message of 2025-07-11 05:31:17 UTC:
Thank you! I will look into this now. -- I'm trying to live my life -- a task so difficult it has never been attempted before. --Ashleigh Brilliant
Hi!
My library consortium is hosted by ByWater Solutions and during the last seven days, several times Koha has been slow and then there's an outage for thirty to sixty minutes then things are fine for a while, then there's more slowness and outage. ByWater is addressing these issues, but I want to know if anyone else's sites or clients are having issues like these?
I am no exprt on the issue but we see this a lot on our open access repositories. There we were able to get it under control for the time being using `fail2ban`. As robots.txt tends to get ignored especially by the AI (I am sure of the meaning of `A` but I doubt the common translation of `I`...) the setup on our end is quite customized to the URLs exposed by the repos. For the jail config we currently use ``` [apache-proxy] enabled = true filter = apache-proxy action = hostsdeny-proxy[name = apache-proxy] logpath = /opt/invenio/var/log/apache-ssl.log maxretry = 12 findtime = 5 bantime = 6000 ``` However, for this to work it's imperative that your proxy pass on the origin of the request (`X-Forwarded-For`).
I am also trying to get educated on modern internet threats and causes for issues like this.
It might be worthwhile to check up also with the repository community. With the exposed full texts they provide much more valuable content for the so called AI than just the bibliographic description. Additionally, in the past we saw quite aggressive bots harvesting the full texts for non-AI-related uses, so they nag us for quite some time. But unfortunately, there is no simple solution. -- Kind regards, Alexander Wagner Deutsches Elektronen-Synchrotron DESY Library and Documentation Building 01d Room OG1.444 Notkestr. 85 22607 Hamburg phone: +49-40-8998-1758 e-mail: alexander.wagner@desy.de
We have similar issue with misbehaving AI crawlers, especially trying to crawl the opac_search.pl page, which should be restricted by robots.txt. fail2ban was only a partial success or stopgap, and didn't really fix things. What is working MUCH better is setting the apache MaxRequestWorkers option (or possibly MaxClients on some systems) in the apache.conf file. I haven't needed to manually intervene with our server for this issue since setting the option. We still have some outages, but they are significantly less frequent (maybe one or two every other week) and the server typically recovers on it's own by the time uptimerobot detects it and sends the notification to me. Further tuning of the value could probably eliminate the problem for us entirely. As is often the case, the best value here will depend on your specific server, database size, configuration, and load, but 15 can be a nice starting place. *Joel Coehoorn* Director of Information Technology *York University* Office: 402-363-5603 | jcoehoorn@york.edu | york.edu On Mon, Jun 23, 2025 at 1:49 AM Wagner, Alexander <alexander.wagner@desy.de> wrote:
Hi!
My library consortium is hosted by ByWater Solutions and during the last seven days, several times Koha has been slow and then there's an outage for thirty to sixty minutes then things are fine for a while, then there's more slowness and outage. ByWater is addressing these issues, but I want to know if anyone else's sites or clients are having issues like these?
I am no exprt on the issue but we see this a lot on our open access repositories. There we were able to get it under control for the time being using `fail2ban`. As robots.txt tends to get ignored especially by the AI (I am sure of the meaning of `A` but I doubt the common translation of `I`...) the setup on our end is quite customized to the URLs exposed by the repos. For the jail config we currently use
``` [apache-proxy] enabled = true filter = apache-proxy action = hostsdeny-proxy[name = apache-proxy] logpath = /opt/invenio/var/log/apache-ssl.log maxretry = 12 findtime = 5 bantime = 6000 ```
However, for this to work it's imperative that your proxy pass on the origin of the request (`X-Forwarded-For`).
I am also trying to get educated on modern internet threats and causes for issues like this.
It might be worthwhile to check up also with the repository community. With the exposed full texts they provide much more valuable content for the so called AI than just the bibliographic description. Additionally, in the past we saw quite aggressive bots harvesting the full texts for non-AI-related uses, so they nag us for quite some time. But unfortunately, there is no simple solution.
-- Kind regards,
Alexander Wagner
Deutsches Elektronen-Synchrotron DESY Library and Documentation
Building 01d Room OG1.444 Notkestr. 85 22607 Hamburg
phone: +49-40-8998-1758 e-mail: alexander.wagner@desy.de _______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Thank you Everyone for sharing with us. Koha for us seems to now be running normally and I hope that stays the status quo. Best, Christopher Davis Technical Services Specialist Lincoln County Library District (541) 264-8141 <tel:5412648141> christopher.davis@lincolncolibrarydist.org www.lincolncolibrarydist.org <https://www.lincolncolibrarydist.org> Click to view my schedule or book an appointment w/ me <https://calendar.app.google/6Ymx4UrHiTfR6faaA> On 6/23/25 6:42 AM, Coehoorn, Joel wrote:
We have similar issue with misbehaving AI crawlers, especially trying to crawl the opac_search.pl <http://opac_search.pl> page, which should be restricted by robots.txt.
fail2ban was only a partial success or stopgap, and didn't really fix things.
What is working MUCH better is setting the apache MaxRequestWorkers option (or possibly MaxClients on some systems) in the apache.conf file. I haven't needed to manually intervene with our server for this issue since setting the option. We still have some outages, but they are significantly less frequent (maybe one or two every other week) and the server typically recovers on it's own by the time uptimerobot detects it and sends the notification to me. Further tuning of the value could probably eliminate the problem for us entirely. As is often the case, the best value here will depend on your specific server, database size, configuration, and load, but 15 can be a nice starting place.
*Joel Coehoorn* Director of Information Technology *York University* Office: 402-363-5603 | jcoehoorn@york.edu | york.edu <https://york.edu>
On Mon, Jun 23, 2025 at 1:49 AM Wagner, Alexander <alexander.wagner@desy.de> wrote:
Hi!
> My library consortium is hosted by ByWater Solutions and during the last > seven days, several times Koha has been slow and then there's an outage > for thirty to sixty minutes then things are fine for a while, then > there's more slowness and outage. ByWater is addressing these issues, > but I want to know if anyone else's sites or clients are having issues > like these?
I am no exprt on the issue but we see this a lot on our open access repositories. There we were able to get it under control for the time being using `fail2ban`. As robots.txt tends to get ignored especially by the AI (I am sure of the meaning of `A` but I doubt the common translation of `I`...) the setup on our end is quite customized to the URLs exposed by the repos. For the jail config we currently use
``` [apache-proxy] enabled = true filter = apache-proxy action = hostsdeny-proxy[name = apache-proxy] logpath = /opt/invenio/var/log/apache-ssl.log maxretry = 12 findtime = 5 bantime = 6000 ```
However, for this to work it's imperative that your proxy pass on the origin of the request (`X-Forwarded-For`).
> I am also trying to get educated on modern internet threats and causes > for issues like this.
It might be worthwhile to check up also with the repository community. With the exposed full texts they provide much more valuable content for the so called AI than just the bibliographic description. Additionally, in the past we saw quite aggressive bots harvesting the full texts for non-AI-related uses, so they nag us for quite some time. But unfortunately, there is no simple solution.
-- Kind regards,
Alexander Wagner
Deutsches Elektronen-Synchrotron DESY Library and Documentation
Building 01d Room OG1.444 Notkestr. 85 22607 Hamburg
phone: +49-40-8998-1758 e-mail: alexander.wagner@desy.de _______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
We are still having issues ☹ Koha has been unreliable for the past 2 hours. 520 errors and Cloudflare handshake errors. Lori Ann Thorrat Manager Catalog / Processing Cuyahoga County Public Library Administrative Offices 2111 Snow Road / Parma Ohio 44134-2728 p 216.749.9373 / f 216.749.9445 cuyahogalibrary.org Pronouns: she/her/hers -----Original Message----- From: Koha <koha-bounces@lists.katipo.co.nz> On Behalf Of Chris Davis Sent: Monday, June 23, 2025 1:52 PM To: Koha@lists.katipo.co.nz Subject: [Possible Spam] Re: [Koha] Slowness & outages Importance: Low Thank you Everyone for sharing with us. Koha for us seems to now be running normally and I hope that stays the status quo. Best, Christopher Davis Technical Services Specialist Lincoln County Library District (541) 264-8141 <tel:5412648141> christopher.davis@lincolncolibrarydist.org www.lincolncolibrarydist.org <https://www.lincolncolibrarydist.org> Click to view my schedule or book an appointment w/ me <https://calendar.app.google/6Ymx4UrHiTfR6faaA> On 6/23/25 6:42 AM, Coehoorn, Joel wrote:
We have similar issue with misbehaving AI crawlers, especially trying to crawl the opac_search.pl <http://opac_search.pl> page, which should be restricted by robots.txt.
fail2ban was only a partial success or stopgap, and didn't really fix things.
What is working MUCH better is setting the apache MaxRequestWorkers option (or possibly MaxClients on some systems) in the apache.conf file. I haven't needed to manually intervene with our server for this issue since setting the option. We still have some outages, but they are significantly less frequent (maybe one or two every other week) and the server typically recovers on it's own by the time uptimerobot detects it and sends the notification to me. Further tuning of the value could probably eliminate the problem for us entirely. As is often the case, the best value here will depend on your specific server, database size, configuration, and load, but 15 can be a nice starting place.
*Joel Coehoorn* Director of Information Technology *York University* Office: 402-363-5603 | jcoehoorn@york.edu | york.edu <https://york.edu>
On Mon, Jun 23, 2025 at 1:49 AM Wagner, Alexander <alexander.wagner@desy.de> wrote:
Hi!
> My library consortium is hosted by ByWater Solutions and during the last > seven days, several times Koha has been slow and then there's an outage > for thirty to sixty minutes then things are fine for a while, then > there's more slowness and outage. ByWater is addressing these issues, > but I want to know if anyone else's sites or clients are having issues > like these?
I am no exprt on the issue but we see this a lot on our open access repositories. There we were able to get it under control for the time being using `fail2ban`. As robots.txt tends to get ignored especially by the AI (I am sure of the meaning of `A` but I doubt the common translation of `I`...) the setup on our end is quite customized to the URLs exposed by the repos. For the jail config we currently use
``` [apache-proxy] enabled = true filter = apache-proxy action = hostsdeny-proxy[name = apache-proxy] logpath = /opt/invenio/var/log/apache-ssl.log maxretry = 12 findtime = 5 bantime = 6000 ```
However, for this to work it's imperative that your proxy pass on the origin of the request (`X-Forwarded-For`).
> I am also trying to get educated on modern internet threats and causes > for issues like this.
It might be worthwhile to check up also with the repository community. With the exposed full texts they provide much more valuable content for the so called AI than just the bibliographic description. Additionally, in the past we saw quite aggressive bots harvesting the full texts for non-AI-related uses, so they nag us for quite some time. But unfortunately, there is no simple solution.
-- Kind regards,
Alexander Wagner
Deutsches Elektronen-Synchrotron DESY Library and Documentation
Building 01d Room OG1.444 Notkestr. 85 22607 Hamburg
phone: +49-40-8998-1758 e-mail: alexander.wagner@desy.de _______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
participants (10)
-
Chris Davis -
Coehoorn, Joel -
Drucker, Jon -
Jason Boyer -
Lori Ann Thorrat -
Mark Alexander -
Michael Kuhn -
Stowasser Rainer -
Thomas Klausner -
Wagner, Alexander