how to avoid high cpu uses due to web crawlers
Hello I found that an IP 47.76.35.19 is hitting my opac continuously, due to which CPU use is very high, and it makes the entire Koha opac and staff client very slow. I tried following the links but could not resolve the issue. https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042#c3 https://wiki.koha-community.org/wiki/Koha_Tuning_Guide I am also not able to locate the file .htaccess in mu ubuntu 18.04 with koha 20.04 Can anyone how to resolve this? With Regards, Vinod Kumar Mishra, (Ph.D, MLISC, MA, B.Sc, DCA) Assistant Librarian, Biju Patnaik Central Library (BPCL), NIT Rourkela, Sundergadh-769008, Odisha, India. Mob:91+9439420860 URL: https://vinod.itshelp.co.in/ <http://vinod.itshelp.co.in/> ORCID ID: https://orcid.org/0000-0003-4666-7874 <http://orcid.org/0000-0003-4666-7874> Scopus ID: 57223138343 *"Spiritual relationship is far more precious than physical. Physical relationship divorced from spiritual is body without soul" -- Mahatma Gandhi*
Hi sir, Try to block the ip that is hitting on your server. Best Regards Nirmit Krishnatray | Associate Manager - Professional Services DBS Business Center, World Trade Tower, Barakhamba Lane,Connaught Place, New Delhi – 110001 M: +91 9003078515 | E: nirmit@edutech.com Edutech India | LinkedIn | Twitter | Facebook | Youtube -----Original Message----- From: Koha [mailto:koha-bounces@lists.katipo.co.nz] On Behalf Of vinod mishra Sent: 10 January 2024 12:52 To: Koha <Koha@lists.katipo.co.nz> Subject: [Koha] how to avoid high cpu uses due to web crawlers Hello I found that an IP 47.76.35.19 is hitting my opac continuously, due to which CPU use is very high, and it makes the entire Koha opac and staff client very slow. I tried following the links but could not resolve the issue. https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042#c3 https://wiki.koha-community.org/wiki/Koha_Tuning_Guide I am also not able to locate the file .htaccess in mu ubuntu 18.04 with koha 20.04 Can anyone how to resolve this? With Regards, Vinod Kumar Mishra, (Ph.D, MLISC, MA, B.Sc, DCA) Assistant Librarian, Biju Patnaik Central Library (BPCL), NIT Rourkela, Sundergadh-769008, Odisha, India. Mob:91+9439420860 URL: https://vinod.itshelp.co.in/ <http://vinod.itshelp.co.in/> ORCID ID: https://orcid.org/0000-0003-4666-7874 <http://orcid.org/0000-0003-4666-7874> Scopus ID: 57223138343 *"Spiritual relationship is far more precious than physical. Physical relationship divorced from spiritual is body without soul" -- Mahatma Gandhi* _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Thanks that is the ultimate solution, looking for any other effective solution too if faced in future. Creating robots.txt file seems easy but finding which crawler is hitting is difficult with IP On Wed, 10 Jan, 2024, 13:16 Nirmit Krishnatray, <nirmit@edutech.com> wrote:
Hi sir,
Try to block the ip that is hitting on your server.
Best Regards Nirmit Krishnatray | Associate Manager - Professional Services DBS Business Center, World Trade Tower, Barakhamba Lane,Connaught Place, New Delhi – 110001 M: +91 9003078515 | E: nirmit@edutech.com Edutech India | LinkedIn | Twitter | Facebook | Youtube
-----Original Message----- From: Koha [mailto:koha-bounces@lists.katipo.co.nz] On Behalf Of vinod mishra Sent: 10 January 2024 12:52 To: Koha <Koha@lists.katipo.co.nz> Subject: [Koha] how to avoid high cpu uses due to web crawlers
Hello
I found that an IP 47.76.35.19 is hitting my opac continuously, due to which CPU use is very high, and it makes the entire Koha opac and staff client very slow.
I tried following the links but could not resolve the issue.
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042#c3 https://wiki.koha-community.org/wiki/Koha_Tuning_Guide
I am also not able to locate the file .htaccess in mu ubuntu 18.04 with koha 20.04 Can anyone how to resolve this?
With Regards,
Vinod Kumar Mishra, (Ph.D, MLISC, MA, B.Sc, DCA) Assistant Librarian, Biju Patnaik Central Library (BPCL), NIT Rourkela, Sundergadh-769008, Odisha, India. Mob:91+9439420860 URL: https://vinod.itshelp.co.in/ <http://vinod.itshelp.co.in/> ORCID ID: https://orcid.org/0000-0003-4666-7874 <http://orcid.org/0000-0003-4666-7874> Scopus ID: 57223138343
*"Spiritual relationship is far more precious than physical. Physical relationship divorced from spiritual is body without soul" -- Mahatma Gandhi* _______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Hi!
I found that an IP 47.76.35.19 is hitting my opac continuously, due to which CPU use is very high, and it makes the entire Koha opac and staff client very slow.
This does not look like a legit crawler. So most likely you can't tackle this guy with a robots.txt as most likely it will not respect it anyway.
I am also not able to locate the file .htaccess in mu ubuntu 18.04 with koha 20.04 Can anyone how to resolve this?
`.htaccess` files do not exist by default, you'd have to create it in the appropriate place with proper permissions and ownerships using your favourite text-editor. They are basically folder based firewall rules read by your webserver. IOW you could either use those or have a rule in your apache configs. I am no expert in either but on one of our current (non-koha)-systems we use something like ``` # Turn badips away RewriteMap hosts-deny "txt:/opt/invenio/var/tmp/hosts-deny.txt" RewriteCond "${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}" "!=NOT-FOUND" [OR] RewriteCond "${hosts-deny:%{HTTP:X-Forwarded-For}|NOT-FOUND}" "!=NOT-FOUND" RewriteRule .* - [R=429,L] ``` in the apache configs. This refers to a txt-file in this case in some funny path `/opt/invenio/var/tmp/` called `hosts-deny.txt` that lists the ip-addresses that should be dropped. You could in principle create such a file in some place your apache can see it. This makes it a bit easier to handle unwanted "crawlers" as you just add the offending ips there. HTH. -- Kind regards, Alexander Wagner Deutsches Elektronen-Synchrotron DESY Library and Documentation Building 01d Room OG1.444 Notkestr. 85 22607 Hamburg phone: +49-40-8998-1758 e-mail: alexander.wagner@desy.de
participants (3)
-
Nirmit Krishnatray -
vinod mishra -
Wagner, Alexander