[Koha] Problems with the facebook web crawler
Chris Brown
chris at stayawake.co.uk
Fri Jul 26 04:11:26 NZST 2024
Hi Nigel et al,
I recently noticed the load on our Koha server was getting ridiculously
high and investigation showed that most of it was bot requests for
opan-search.pl (averaging about one a second!) I have managed to stop the
ones that were hurting us with robots.txt and I am fairly confident that
amazobot does respect this. We haven't had any trouble from Facebook (so
far)
Chris Brown
On Thu, Jul 25, 2024 at 3:58 PM Coehoorn, Joel <jcoehoorn at york.edu> wrote:
> We've had a couple recent crashes I haven't yet had time to dig into. This
> would explain it :/
> And as I look now, I also see a bunch of AmazonBot, but I haven't yet
> checked whether this would at least respect robots.txt
>
> The really annoying thing about this is the catalog is there to be public.
> It's why it exists. To that end we have the oai-pmh service available,
> which would give them all the data they could reasonably expect in a much
> more efficient way.
>
> *Joel Coehoorn*
> Director of Information Technology
> *York University*
> Office: 402-363-5603 | jcoehoorn at york.edu | york.edu
>
>
>
> On Thu, Jul 25, 2024 at 6:27 AM Nigel Titley <nigel at titley.com> wrote:
>
> > Is anyone else getting problems with the facebook web crawler hammering
> > their OPAC search function?
> >
> > This has been happening on and off for a couple of months but set in
> > with a vengeance a couple of days ago. The crawler is hitting us with
> > many OPAC search queries, beyond the capacity of our system to respond.
> >
> > robots.txt is being ignored
> >
> > I started by blocking facebook's entire IPv6 range as the queries were
> > all coming in over IPv6. They responded by switching to IPv4 and because
> > they have a number of blocks it wasn't practical to block each and every
> > one of them.
> >
> > I've temporarily switched off OPAC entirely and the system has returned
> > to normal and I can at least perform intranet functions but this is
> > obviously non-ideal.
> >
> > Does anyone have any thoughts on this?
> >
> > I'm running 22.05.13.000 on Ubuntu.
> >
> > Thanks
> >
> > Nigel
> > _______________________________________________
> >
> > Koha mailing list http://koha-community.org
> > Koha at lists.katipo.co.nz
> > Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> >
> _______________________________________________
>
> Koha mailing list http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>
More information about the Koha
mailing list