Hello, I've observed a concerning issue with our Koha server, where multiple bots are causing downtime and significantly increasing CPU usage. Some of the problematic bots include: * PetalBot;+https://webmaster.petalsearch.com/site/petalbot * MJ12bot/v1.4.8; http://mj12bot.com/ * SemrushBot/7~bl; +http://www.semrush.com/bot.html Despite attempting to address this by adding a robots.txt file, it hasn't proven effective in preventing these bots from causing disruptions. Additionally, the dynamic nature of IP addresses makes it challenging to block them individually. Furthermore, I've noticed that the Apache2 server is generating internal requests, and I'm uncertain about the cause and purpose of these requests. ` ::1 - - [15/Jan/2024:12:40:41 +0530] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.4.41 (Ubuntu) OpenSSL/1.1.1f (internal dummy connection)" ` I need your expertise to fix the bot issues impacting server performance, high CPU usage, and prevent unauthorized internal requests. Thanks and Regards, Amar Londhe Full-Stack Developer On 11/01/24 4:30 am, koha-request@lists.katipo.co.nz wrote:
Send Koha mailing list submissions to koha@lists.katipo.co.nz
To subscribe or unsubscribe via the World Wide Web, visit https://lists.katipo.co.nz/mailman/listinfo/koha or, via email, send a message with subject or body 'help' to koha-request@lists.katipo.co.nz
You can reach the person managing the list at koha-owner@lists.katipo.co.nz
When replying, please edit your Subject line so it is more specific than "Re: Contents of Koha digest..."
Today's Topics:
1. how to avoid high cpu uses due to web crawlers (vinod mishra) 2. Re: how to avoid high cpu uses due to web crawlers (Nirmit Krishnatray) 3. Re: how to avoid high cpu uses due to web crawlers (vinod mishra) 4. Re: how to avoid high cpu uses due to web crawlers (Wagner, Alexander) 5. koha-US Board Meeting Minutes for January 10, 2024 (Kristi Krueger)
----------------------------------------------------------------------
Message: 1 Date: Wed, 10 Jan 2024 12:51:38 +0530 From: vinod mishra<mishravk79@gmail.com> To: Koha<Koha@lists.katipo.co.nz> Subject: [Koha] how to avoid high cpu uses due to web crawlers Message-ID: <CAGLUwiRDAsH66xeQoznjHXxEiGiiGvdTv8P7w5uihM3H93mU2g@mail.gmail.com> Content-Type: text/plain; charset="UTF-8"
Hello
I found that an IP 47.76.35.19 is hitting my opac continuously, due to which CPU use is very high, and it makes the entire Koha opac and staff client very slow.
I tried following the links but could not resolve the issue.
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042#c3 https://wiki.koha-community.org/wiki/Koha_Tuning_Guide
I am also not able to locate the file .htaccess in mu ubuntu 18.04 with koha 20.04 Can anyone how to resolve this?
With Regards,
Vinod Kumar Mishra, (Ph.D, MLISC, MA, B.Sc, DCA) Assistant Librarian, Biju Patnaik Central Library (BPCL), NIT Rourkela, Sundergadh-769008, Odisha, India. Mob:91+9439420860 URL:https://vinod.itshelp.co.in/ <http://vinod.itshelp.co.in/> ORCID ID:https://orcid.org/0000-0003-4666-7874 <http://orcid.org/0000-0003-4666-7874> Scopus ID: 57223138343
*"Spiritual relationship is far more precious than physical. Physical relationship divorced from spiritual is body without soul" -- Mahatma Gandhi*
------------------------------
Message: 2 Date: Wed, 10 Jan 2024 07:46:17 +0000 From: Nirmit Krishnatray<nirmit@edutech.com> To: vinod mishra<mishravk79@gmail.com>, Koha <Koha@lists.katipo.co.nz> Subject: Re: [Koha] how to avoid high cpu uses due to web crawlers Message-ID:<27ebb0c12a164436a0b59c8be7e46401@edutech.com> Content-Type: text/plain; charset="utf-8"
Hi sir,
Try to block the ip that is hitting on your server.
Best Regards Nirmit Krishnatray | Associate Manager - Professional Services DBS Business Center, World Trade Tower, Barakhamba Lane,Connaught Place, New Delhi – 110001 M: +91 9003078515 | E:nirmit@edutech.com Edutech India | LinkedIn | Twitter | Facebook | Youtube
-----Original Message----- From: Koha [mailto:koha-bounces@lists.katipo.co.nz] On Behalf Of vinod mishra Sent: 10 January 2024 12:52 To: Koha<Koha@lists.katipo.co.nz> Subject: [Koha] how to avoid high cpu uses due to web crawlers
Hello
I found that an IP 47.76.35.19 is hitting my opac continuously, due to which CPU use is very high, and it makes the entire Koha opac and staff client very slow.
I tried following the links but could not resolve the issue.
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042#c3 https://wiki.koha-community.org/wiki/Koha_Tuning_Guide
I am also not able to locate the file .htaccess in mu ubuntu 18.04 with koha 20.04 Can anyone how to resolve this?
With Regards,
Vinod Kumar Mishra, (Ph.D, MLISC, MA, B.Sc, DCA) Assistant Librarian, Biju Patnaik Central Library (BPCL), NIT Rourkela, Sundergadh-769008, Odisha, India. Mob:91+9439420860 URL:https://vinod.itshelp.co.in/ <http://vinod.itshelp.co.in/> ORCID ID:https://orcid.org/0000-0003-4666-7874 <http://orcid.org/0000-0003-4666-7874> Scopus ID: 57223138343
*"Spiritual relationship is far more precious than physical. Physical relationship divorced from spiritual is body without soul" -- Mahatma Gandhi* _______________________________________________
Koha mailing listhttp://koha-community.org Koha@lists.katipo.co.nz Unsubscribe:https://lists.katipo.co.nz/mailman/listinfo/koha
------------------------------
Message: 3 Date: Wed, 10 Jan 2024 13:23:31 +0530 From: vinod mishra<mishravk79@gmail.com> To: Nirmit Krishnatray<nirmit@edutech.com> Cc: Koha<Koha@lists.katipo.co.nz> Subject: Re: [Koha] how to avoid high cpu uses due to web crawlers Message-ID: <CAGLUwiTpOGzFrz84VAX5BZvSa7ejZ+riVFmtDEVuy1c-QTGU0A@mail.gmail.com> Content-Type: text/plain; charset="UTF-8"
Thanks that is the ultimate solution, looking for any other effective solution too if faced in future. Creating robots.txt file seems easy but finding which crawler is hitting is difficult with IP
On Wed, 10 Jan, 2024, 13:16 Nirmit Krishnatray,<nirmit@edutech.com> wrote:
Hi sir,
Try to block the ip that is hitting on your server.
Best Regards Nirmit Krishnatray | Associate Manager - Professional Services DBS Business Center, World Trade Tower, Barakhamba Lane,Connaught Place, New Delhi – 110001 M: +91 9003078515 | E:nirmit@edutech.com Edutech India | LinkedIn | Twitter | Facebook | Youtube
-----Original Message----- From: Koha [mailto:koha-bounces@lists.katipo.co.nz] On Behalf Of vinod mishra Sent: 10 January 2024 12:52 To: Koha<Koha@lists.katipo.co.nz> Subject: [Koha] how to avoid high cpu uses due to web crawlers
Hello
I found that an IP 47.76.35.19 is hitting my opac continuously, due to which CPU use is very high, and it makes the entire Koha opac and staff client very slow.
I tried following the links but could not resolve the issue.
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042#c3 https://wiki.koha-community.org/wiki/Koha_Tuning_Guide
I am also not able to locate the file .htaccess in mu ubuntu 18.04 with koha 20.04 Can anyone how to resolve this?
With Regards,
Vinod Kumar Mishra, (Ph.D, MLISC, MA, B.Sc, DCA) Assistant Librarian, Biju Patnaik Central Library (BPCL), NIT Rourkela, Sundergadh-769008, Odisha, India. Mob:91+9439420860 URL:https://vinod.itshelp.co.in/ <http://vinod.itshelp.co.in/> ORCID ID: https://orcid.org/0000-0003-4666-7874 <http://orcid.org/0000-0003-4666-7874> Scopus ID: 57223138343
*"Spiritual relationship is far more precious than physical. Physical relationship divorced from spiritual is body without soul" -- Mahatma Gandhi* _______________________________________________
Koha mailing listhttp://koha-community.org Koha@lists.katipo.co.nz Unsubscribe:https://lists.katipo.co.nz/mailman/listinfo/koha
------------------------------
Message: 4 Date: Wed, 10 Jan 2024 10:19:55 +0100 (CET) From: "Wagner, Alexander"<alexander.wagner@desy.de> To: vinod mishra<mishravk79@gmail.com> Cc: Koha<Koha@lists.katipo.co.nz> Subject: Re: [Koha] how to avoid high cpu uses due to web crawlers Message-ID:<2002597982.8686470.1704878395276.JavaMail.zimbra@desy.de> Content-Type: text/plain; charset=utf-8
Hi!
I found that an IP 47.76.35.19 is hitting my opac continuously, due to which CPU use is very high, and it makes the entire Koha opac and staff client very slow. This does not look like a legit crawler. So most likely you can't tackle this guy with a robots.txt as most likely it will not respect it anyway.
I am also not able to locate the file .htaccess in mu ubuntu 18.04 with koha 20.04 Can anyone how to resolve this? `.htaccess` files do not exist by default, you'd have to create it in the appropriate place with proper permissions and ownerships using your favourite text-editor. They are basically folder based firewall rules read by your webserver. IOW you could either use those or have a rule in your apache configs.
I am no expert in either but on one of our current (non-koha)-systems we use something like
```
# Turn badips away RewriteMap hosts-deny "txt:/opt/invenio/var/tmp/hosts-deny.txt" RewriteCond "${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}" "!=NOT-FOUND" [OR] RewriteCond "${hosts-deny:%{HTTP:X-Forwarded-For}|NOT-FOUND}" "!=NOT-FOUND" RewriteRule .* - [R=429,L]
```
in the apache configs. This refers to a txt-file in this case in some funny path `/opt/invenio/var/tmp/` called `hosts-deny.txt` that lists the ip-addresses that should be dropped. You could in principle create such a file in some place your apache can see it. This makes it a bit easier to handle unwanted "crawlers" as you just add the offending ips there.
HTH.