[Koha] Problems with the facebook web crawler

Fri Jul 26 01:04:26 NZST 2024

On 25/07/2024 13:55, Jason Boyer wrote:
> While they do ignore robots.txt they do at least supply a recognizable 
> user agent that you can just block:
> 
> RewriteEngine on
> RewriteCond %{HTTP_USER_AGENT} "facebookexternalhit|other|bots|here"
> RewriteCond %{REQUEST_URI} "!403\.pl" [NC]
> RewriteRule "^.*" "-" [F]
> 
> Note that second RewriteCond is required or you'll end up with a 
> redirect loop. They will still be sending you requests but at least they 
> won't tie up a plack backend doing useless work. I haven't tried 
> returning 5xx errors to see if that causes them to back off but I doubt 
> they would take much notice.

Brilliant... that should help a lot. I'll also try Michael's approach 
for comparison

Nigel