[Koha] Problems with the facebook web crawler

Jason Boyer JBoyer at equinoxOLI.org
Fri Jul 26 00:55:53 NZST 2024


While they do ignore robots.txt they do at least supply a recognizable 
user agent that you can just block:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here"
RewriteCond %{REQUEST_URI} "!403\.pl" [NC]
RewriteRule "^.*" "-" [F]

Note that second RewriteCond is required or you'll end up with a 
redirect loop. They will still be sending you requests but at least 
they won't tie up a plack backend doing useless work. I haven't tried 
returning 5xx errors to see if that causes them to back off but I doubt 
they would take much notice.

Jason
-- 
Jason Boyer
Senior System Administrator
Equinox Open Library Initiative
JBoyer at equinoxOLI.org
+1 (877) Open-ILS (673-6457)
https://equinoxOLI.org/ <https://equinoxoli.org/>

On Thu, Jul 25 2024 at 01:45:56 PM +0100, Nigel Titley 
<nigel at titley.com> wrote:
> Dear Michael
> 
> On 25/07/2024 13:28, Michael Kuhn wrote:
>> Hi Nigel
>> 
>> In such a case I would advise to create a sitemap - unfortunately 
>> this Koha feature seems not so well documented, but the following 
>> may give you a start:
>> 
>> * <https://lists.katipo.co.nz/public/koha/2020-November/055401.html>
>> 
>> * 
>> <https://wiki.koha-community.org/wiki/Commands_provided_by_the_Debian_packages#koha-sitemap>
>> 
>> * 
>> <https://koha-community.org/manual/24.05/en/html/cron_jobs.html#sitemap>
> 
> Thanks for this. I'll give it a go and see what happens, although if 
> Facebook is ignoring the robots.txt file I suspect it will ignore the 
> sitemap too.
> 
> There's been a great deal of annoyance about this on the facebook 
> developers forums.
> 
> I'll let you know how it goes
> 
> Nigel
> _______________________________________________
> 
> Koha mailing list  http://koha-community.org 
> <http://koha-community.org/>
> Koha at lists.katipo.co.nz <mailto:Koha at lists.katipo.co.nz>
> Unsubscribe: <https://lists.katipo.co.nz/mailman/listinfo/koha>



More information about the Koha mailing list