[Koha] Problems with the facebook web crawler
Jason Boyer
JBoyer at equinoxOLI.org
Fri Jul 26 00:55:53 NZST 2024
While they do ignore robots.txt they do at least supply a recognizable
user agent that you can just block:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here"
RewriteCond %{REQUEST_URI} "!403\.pl" [NC]
RewriteRule "^.*" "-" [F]
Note that second RewriteCond is required or you'll end up with a
redirect loop. They will still be sending you requests but at least
they won't tie up a plack backend doing useless work. I haven't tried
returning 5xx errors to see if that causes them to back off but I doubt
they would take much notice.
Jason
--
Jason Boyer
Senior System Administrator
Equinox Open Library Initiative
JBoyer at equinoxOLI.org
+1 (877) Open-ILS (673-6457)
https://equinoxOLI.org/ <https://equinoxoli.org/>
On Thu, Jul 25 2024 at 01:45:56 PM +0100, Nigel Titley
<nigel at titley.com> wrote:
> Dear Michael
>
> On 25/07/2024 13:28, Michael Kuhn wrote:
>> Hi Nigel
>>
>> In such a case I would advise to create a sitemap - unfortunately
>> this Koha feature seems not so well documented, but the following
>> may give you a start:
>>
>> * <https://lists.katipo.co.nz/public/koha/2020-November/055401.html>
>>
>> *
>> <https://wiki.koha-community.org/wiki/Commands_provided_by_the_Debian_packages#koha-sitemap>
>>
>> *
>> <https://koha-community.org/manual/24.05/en/html/cron_jobs.html#sitemap>
>
> Thanks for this. I'll give it a go and see what happens, although if
> Facebook is ignoring the robots.txt file I suspect it will ignore the
> sitemap too.
>
> There's been a great deal of annoyance about this on the facebook
> developers forums.
>
> I'll let you know how it goes
>
> Nigel
> _______________________________________________
>
> Koha mailing list http://koha-community.org
> <http://koha-community.org/>
> Koha at lists.katipo.co.nz <mailto:Koha at lists.katipo.co.nz>
> Unsubscribe: <https://lists.katipo.co.nz/mailman/listinfo/koha>
More information about the Koha
mailing list