[Koha] robots.txt

Tue Nov 3 21:24:08 NZDT 2009

2009/11/2 Sébastien Hinderer <Sebastien.Hinderer at snv.jussieu.fr>:
> That's what I don't know -- how I want them to behave.
> My guess is that everything shold be disallowed because no page has a
> meaning without arguments... What I would like to know is whether this
> guess is correct or not.

I think it's not, actually. I have set up Koha for my customer #1 at
sksk.bibkat.no and without doing anything I now get 7.000+ hits in
Google when I search for site:sksk.bibkat.no:

http://www.google.com/search?q=site%3Asksk.bibkat.no

The very first hit is for a page of search results for the norwegian
word for birds. How they figured that out, I have no idea! The strange
thing is that this catalogue is hardly linked to from anywhere, so
they must have some way to index the catalogue other than just
following links.

I notice that on the second page of search results there are several
MARC-views - hardly what you want patrons to find first. So perhaps
there should be some way to tell bots to just index the "ordinary"
views, not things like MARC?

Also, having a robots.txt just to say "index everything" sounds like a
good idea, to avoid the "robots.txt not found" messages in the error
log.

Regards,
Magnus
libriotech.no