At 11:12 AM 6/2/2015 +1200, Robin Sheat wrote:
schnydszch schreef op vr 29-05-2015 om 05:17 [-0700]:
And oh yes once the bloated error log was deleted and the server restarted, the Koha ILS was already accessible. I'm particularly curious if that really is the case. [snip] As for webcrawlers, they can be a real problem every once in a while. We slow down the well behaving ones with robots.txt to prevent them putting too much load on the server, and if we spot one that's not obeying that, we firewall it off.
We've already been obliged to firewall Baidu and others [1] from the same region. And I must admit that "google is getting my goat." Woodpeckering may be a fact of modern life, but at several million entries per day, there's got to be a limit. Best -- Paul [1] ShenZhen Sunris Deny 202.46.32.0-202.46.63.255 Beijing Baidu Deny 180.76.0.0-180.76.255.255 NCICNET-NET Deny 175.180.0.0-175.183.255.255 CHINACACHE-1 Deny 69.28.48.0-69.28.63.255 XeraCom IT AB Deny 89.160.60.192-89.160.60.223
Paul A schreef op di 02-06-2015 om 09:27 [-0400]:
And I must admit that "google is getting my goat." Woodpeckering may be a fact of modern life, but at several million entries per day, there's got to be a limit.
Use robots.txt to slow down its crawl rate. Some crawlers don't obey that, but google does. -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF
At 11:30 AM 6/3/2015 +1200, Robin Sheat wrote:
Paul A schreef op di 02-06-2015 om 09:27 [-0400]:
And I must admit that "google is getting my goat." Woodpeckering may be a fact of modern life, but at several million entries per day, there's got to be a limit.
Use robots.txt to slow down its crawl rate. Some crawlers don't obey that, but google does.
How do you "slow it down"? To the best of my knowledge it's an on/off choice. See <https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt>. tnx and br -- p.
Paul A schreef op di 02-06-2015 om 19:48 [-0400]:
How do you "slow it down"? To the best of my knowledge it's an on/off choice. See <https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt>.
https://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directiv... PS: no need to CC me in replies, I'm on the mailing list. -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF
Hi Robin & Paul! These are interesting reads. Bookmarking this links for future reference. Have a nice day ya'll! Cheers! -- View this message in context: http://koha.1045719.n5.nabble.com/Help-Software-Error-tp5842000p5842626.html Sent from the Koha-general mailing list archive at Nabble.com.
At 01:13 PM 6/3/2015 +1200, Robin Sheat wrote:
Paul A schreef op di 02-06-2015 om 19:48 [-0400]:
How do you "slow it down"? To the best of my knowledge it's an on/off choice. See
<https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt>.
https://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directiv...
PS: no need to CC me in replies, I'm on the mailing list.
Robin -- thanks (and apologies for the unintended Cc, end of a long day, etc). This was specifically for Google, so I had a look at the wiki article and did some more digging. It seems that at least Bing and Yahoo (neither of whom appear to abuse our servers) support the Crawl-Delay, but Google does not, they have a differing method explained at <https://support.google.com/webmasters/answer/48620?hl=en>. I'll have to look more closely, but it may not be quite what I'm looking for as it involves their proprietorial "webmaster tools" rather than robots.txt Best -- Paul
Paul A schreef op wo 03-06-2015 om 09:50 [-0400]:
This was specifically for Google, so I had a look at the wiki article and did some more digging. It seems that at least Bing and Yahoo (neither of whom appear to abuse our servers) support the Crawl-Delay
It seems you're right. That's a bit annoying of them. I apparently had it in my head that all the major search engines did. -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF
Robin Sheat wrote:
It seems you're right. That's a bit annoying of them. I apparently had it in my head that all the major search engines did.
Would there be interest in a wiki page listing some of the robots.txt-ignoring bots which can give even a tuned Koha indigestion? Or do you know if some other project already keeps such a list? We've collected quite a few over the years with names like ahrefs and majestic21. Regards, -- MJ Ray (slef), member of www.software.coop, a for-more-than-profit co-op http://koha-community.org supporter, web and library systems developer. In My Opinion Only: see http://mjr.towers.org.uk/email.html Available for hire (including development) at http://www.software.coop/
That will be nice MJ. So that other koha (system) administrators will be able to optimize their Koha ILS and tuned it for better performance. This robots-ignoring crawlers are annoyance especially in a very limited server space. -- View this message in context: http://koha.1045719.n5.nabble.com/Help-Software-Error-tp5842000p5843565.html Sent from the Koha-general mailing list archive at Nabble.com.
More interesting find in this particular koha server are spammed purchase suggestions from here: http://hreplib.congress.gov.ph/cgi-bin/koha/opac-suggestions.pl Sample image here: <http://koha.1045719.n5.nabble.com/file/n5843576/1.jpg> and here: <http://koha.1045719.n5.nabble.com/file/n5843576/koha_opacsuggestspam2.jpg> . And looking at the database, the suggestedby row is '2' which is the Koha default user, here's the picture: <http://koha.1045719.n5.nabble.com/file/n5843576/koha_opacsuggestspam3.jpg> With regards to security, are we seeing something strange here? I will be adding opac-suggestions.pl in the robots.txt and then turn off purchase suggestions on the OPAC if same weird event persists. Thanks and cheers Koha community! -- View this message in context: http://koha.1045719.n5.nabble.com/Help-Software-Error-tp5842000p5843576.html Sent from the Koha-general mailing list archive at Nabble.com.
It might be enough to turn off AnonSuggestions. Koha will use the AnonymousPatron borrowernumber for those - which might be what you are seeing there. Am 09.06.2015 um 14:01 schrieb schnydszch:
More interesting find in this particular koha server are spammed purchase suggestions from here: http://hreplib.congress.gov.ph/cgi-bin/koha/opac-suggestions.pl Sample image here: <http://koha.1045719.n5.nabble.com/file/n5843576/1.jpg> and here: <http://koha.1045719.n5.nabble.com/file/n5843576/koha_opacsuggestspam2.jpg> . And looking at the database, the suggestedby row is '2' which is the Koha default user, here's the picture: <http://koha.1045719.n5.nabble.com/file/n5843576/koha_opacsuggestspam3.jpg> With regards to security, are we seeing something strange here? I will be adding opac-suggestions.pl in the robots.txt and then turn off purchase suggestions on the OPAC if same weird event persists. Thanks and cheers Koha community!
-- View this message in context: http://koha.1045719.n5.nabble.com/Help-Software-Error-tp5842000p5843576.html Sent from the Koha-general mailing list archive at Nabble.com. _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
Thanks Katrin! I think that's the best solution for now, to turn off AnonSuggestions. I'm also particularly curious if the web crawler is related to this spammer in opac suggestions. Oh well, will investigate further. Thanks and cheers! -- View this message in context: http://koha.1045719.n5.nabble.com/Help-Software-Error-tp5842000p5843700.html Sent from the Koha-general mailing list archive at Nabble.com.
It seems the borrowernumber in question does not have a username and password: <http://koha.1045719.n5.nabble.com/file/n5843702/kohaopacsuggest_spam1.png> as opposed to another user that has: <http://koha.1045719.n5.nabble.com/file/n5843702/kohaopacsuggest_spam2.png> It seems the spammer exploited this feature and used the username with no password to spam the opac suggestion. My bad for saying borrowernumber 2 as default Koha user which it was not the case. Thanks a lot there! :) -- View this message in context: http://koha.1045719.n5.nabble.com/Help-Software-Error-tp5842000p5843702.html Sent from the Koha-general mailing list archive at Nabble.com.
participants (5)
-
Katrin Fischer -
MJ Ray -
Paul A -
Robin Sheat -
schnydszch