[Koha] Problems with the facebook web crawler

Fri Jul 26 02:58:35 NZST 2024

We've had a couple recent crashes I haven't yet had time to dig into. This
would explain it :/
And as I look now, I also see a bunch of AmazonBot, but I haven't yet
checked whether this would at least respect robots.txt

The really annoying thing about this is the catalog is there to be public.
It's why it exists. To that end we have the oai-pmh service available,
which would give them all the data they could reasonably expect in a much
more efficient way.

*Joel Coehoorn*
Director of Information Technology
*York University*
Office: 402-363-5603 | jcoehoorn at york.edu | york.edu

On Thu, Jul 25, 2024 at 6:27 AM Nigel Titley <nigel at titley.com> wrote:

> Is anyone else getting problems with the facebook web crawler hammering
> their OPAC search function?
>
> This has been happening on and off for a couple of months but set in
> with a vengeance a couple of days ago. The crawler is hitting us with
> many OPAC search queries, beyond the capacity of our system to respond.
>
> robots.txt is being ignored
>
> I started by blocking facebook's entire IPv6 range as the queries were
> all coming in over IPv6. They responded by switching to IPv4 and because
> they have a number of blocks it wasn't practical to block each and every
> one of them.
>
> I've temporarily switched off OPAC entirely and the system has returned
> to normal and I can at least perform intranet functions but this is
> obviously non-ideal.
>
> Does anyone have any thoughts on this?
>
> I'm running 22.05.13.000 on Ubuntu.
>
> Thanks
>
> Nigel
> _______________________________________________
>
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>