[Koha] Problems with the facebook web crawler

Nigel Titley nigel at titley.com
Fri Jul 26 10:57:29 NZST 2024



On 25/07/2024 23:31, Nigel Titley wrote:
> 
> 
> On 25/07/2024 13:55, Jason Boyer wrote:
>> While they do ignore robots.txt they do at least supply a recognizable 
>> user agent that you can just block:
>>
>> RewriteEngine on
>> RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here"
>> RewriteCond %{REQUEST_URI} "!403\.pl" [NC]
>> RewriteRule "^.*" "-" [F]
>>
>> Note that second RewriteCond is required or you'll end up with a 
>> redirect loop. They will still be sending you requests but at least 
>> they won't tie up a plack backend doing useless work. I haven't tried 
>> returning 5xx errors to see if that causes them to back off but I 
>> doubt they would take much notice.
> 
> I'm assuming that this would be placed in
> 
> /etc/koha/apache-shared-opac.conf (I'm not using plack)

I dropped it in and it seems to have worked. OPAC works again and 
facebook is being blocked.

Many thanks

Nigel


More information about the Koha mailing list