[Koha] Problems with the facebook web crawler

Fri Jul 26 04:15:08 NZST 2024

Hi Nigel,

My solution for that is simple two step process:

1) using mod_sec to monitor and match the UA string of the incoming request
against a list of UAs I don't want and return a HTTP 406 if the UA matches
for the first time.

2) Have fail2ban monitor the apache log for 406 and immediately ban the IP
(IPv4 / IPv6) for 96 hours using an apache-badbots jail.

This strategy has so far managed to keep my servers "cool".

cheers
-idg

On Thu, Jul 25, 2024, 16:57 Nigel Titley <nigel at titley.com> wrote:

> Is anyone else getting problems with the facebook web crawler hammering
> their OPAC search function?
>
> This has been happening on and off for a couple of months but set in
> with a vengeance a couple of days ago. The crawler is hitting us with
> many OPAC search queries, beyond the capacity of our system to respond.
>
> robots.txt is being ignored
>
> I started by blocking facebook's entire IPv6 range as the queries were
> all coming in over IPv6. They responded by switching to IPv4 and because
> they have a number of blocks it wasn't practical to block each and every
> one of them.
>
> I've temporarily switched off OPAC entirely and the system has returned
> to normal and I can at least perform intranet functions but this is
> obviously non-ideal.
>
> Does anyone have any thoughts on this?
>
> I'm running 22.05.13.000 on Ubuntu.
>
> Thanks
>
> Nigel
> _______________________________________________
>
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>