Dear all, Does someone use a robots.txt file on a Koha web site ? If so, what should / could such a file contain ? Many thanks in advance for any recommendation or advice, Sébastien.
I use User-agent: * Disallow: / on /usr/share/koha/opac/htdocs/robots.txt bgk On Mon, Nov 2, 2009 at 4:14 AM, Sébastien Hinderer < Sebastien.Hinderer@snv.jussieu.fr> wrote:
Dear all,
Does someone use a robots.txt file on a Koha web site ? If so, what should / could such a file contain ? Many thanks in advance for any recommendation or advice, Sébastien. _______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
Sébastien Hinderer <Sebastien.Hinderer@snv.jussieu.fr> wrote:
Does someone use a robots.txt file on a Koha web site ? If so, what should / could such a file contain ? Many thanks in advance for any recommendation or advice,
It could contain any of http://www.robotstxt.org/robotstxt.html It should contain whatever will make robots behave as you want. I don't think our libraries currently use one but I've not checked specifically for this. Hope that helps, -- MJ Ray, member of www.software.coop Experts in web and GNU/Linux (TTLLP # in subject emails = copy to all workers unless asked.) Turo Technology LLP, reg'd in England+Wales, number OC303457 Reg. Office: 36 Orchard Cl., Kewstoke, Somerset, GB-BS22 9XY
It could contain any of http://www.robotstxt.org/robotstxt.html ... I don't think our libraries currently use one but I've not checked specifically for this.
When we were hosting our own Koha installation we had to start excluding search engine bots (Googlebot in particular) because our server was getting hit too hard and it was slowing everything down. I think LibLime blocks everything by default for its customers now. I'd certainly prefer to be able to let Google in. I'd like the contents of the OPAC to be discoverable in search engines. -- Owen -- Web Developer Athens County Public Libraries http://www.myacpl.org
Owen Leonard <oleonard@myacpl.org> wrote:
When we were hosting our own Koha installation we had to start excluding search engine bots (Googlebot in particular) because our server was getting hit too hard and it was slowing everything down. I think LibLime blocks everything by default for its customers now. I'd certainly prefer to be able to let Google in. I'd like the contents of the OPAC to be discoverable in search engines.
That's pretty much what our librarians have told me when I've asked. To some of them, more eyeballs means more borrowers means more lends and pretty directly means more funding. It is possible to use things like Google Webmaster Tools and even iptables to slow the search engine bots down if/when they become a problem. I don't know if that will override settings like LibLime's blocking. In general, I feel it sucks to be doing the search engine's work for them and they should tread lightly by default, but that's the trade-off if you want the OPAC to be indexed at the moment. Hope that helps, -- MJ Ray (slef) Webmaster and LMS developer at | software www.software.coop http://mjr.towers.org.uk | .... co IMO only: see http://mjr.towers.org.uk/email.html | .... op
MJ Ray (2009/11/02 15:14 +0000):
It could contain any of http://www.robotstxt.org/robotstxt.html
I didn't know this link, thanks.
It should contain whatever will make robots behave as you want.
That's what I don't know -- how I want them to behave. My guess is that everything shold be disallowed because no page has a meaning without arguments... What I would like to know is whether this guess is correct or not.
I don't think our libraries currently use one but I've not checked specifically for this.
I apache error logs and noticed some clients look for this file so I thought I could create one so that the error log does not contain this not too interesting entry. Cheers, Sébastien.
2009/11/2 Sébastien Hinderer <Sebastien.Hinderer@snv.jussieu.fr>:
That's what I don't know -- how I want them to behave. My guess is that everything shold be disallowed because no page has a meaning without arguments... What I would like to know is whether this guess is correct or not.
I think it's not, actually. I have set up Koha for my customer #1 at sksk.bibkat.no and without doing anything I now get 7.000+ hits in Google when I search for site:sksk.bibkat.no: http://www.google.com/search?q=site%3Asksk.bibkat.no The very first hit is for a page of search results for the norwegian word for birds. How they figured that out, I have no idea! The strange thing is that this catalogue is hardly linked to from anywhere, so they must have some way to index the catalogue other than just following links. I notice that on the second page of search results there are several MARC-views - hardly what you want patrons to find first. So perhaps there should be some way to tell bots to just index the "ordinary" views, not things like MARC? Also, having a robots.txt just to say "index everything" sounds like a good idea, to avoid the "robots.txt not found" messages in the error log. Regards, Magnus libriotech.no
If your library uses OCLC to record your holds, bear in mind that this information is ported to Google (and a few other places, I think) and always searchable using http://www.worldcat.org http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16592 <http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16592>Thanks, -- Ben On Mon, Nov 2, 2009 at 2:14 AM, Sébastien Hinderer < Sebastien.Hinderer@snv.jussieu.fr> wrote:
Dear all,
Does someone use a robots.txt file on a Koha web site ? If so, what should / could such a file contain ? Many thanks in advance for any recommendation or advice, Sébastien. _______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
participants (6)
-
Ben Ide -
Bernardo Gonzalez Kriegel -
Magnus Enger -
MJ Ray -
Owen Leonard -
Sébastien Hinderer