[Koha] Block web crawlers Bots

Michael Kuhn mik at adminkuhn.ch
Fri Jan 28 23:40:52 NZDT 2022


Hi

Instead of blocking webcrawlers and bots which want to index your 
records (which is generally a good thing) I recommend you implement a 
sitemap.

* https://koha-community.org/manual/21.11/en/html/cron_jobs.html#sitemap

* https://lists.katipo.co.nz/public/koha/2020-November/055401.html

* 
https://github.com/Koha-Community/Koha/blob/master/misc/cronjobs/sitemap.pl

Like this the webcrawlers will just access your sitemap instead of every 
single record in your database. This will decrease memory usage immensely.

Blocking webcrawler by firewall can get very hairy because many bots do 
not have just one IP address where they come from.

Best wishes: Michael
-- 
Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis
Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz
T 0041 (0)61 261 55 61 · E mik at adminkuhn.ch · W www.adminkuhn.ch



Am 28.01.22 um 06:04 schrieb JITHIN N:
> My server memory usage is increasing by some web search engine bots by
> access of OPAC. I tried put
> 
> *robots.txt in *
> 
> on /usr/share/koha/opac/htdocs
> 
> As
> 
> *User-agent: **
> 
> *Disallow: /*
> 
> But still, my server is getting loaded by bots. The apache log file shows
> bots names like petalbot, Googlebot, AhrefsBot etc. How can I block all
> these bots' access to my OPAC pages?
> 
> With Regards
> 
> Jithin N
> 
> *System Information*
> 
> Koha 21.05
> 
> Deian 10
> _______________________________________________
> 
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha




More information about the Koha mailing list