Re: [Koha] Major, multiple problems with our Koha 17.05 system.
Hi, just want to share our experience with 17.05 to give you some hints where to go. We have been using it in production since it was released and for us it has been stable enough. We are a large public library, 400k users, 19branches, some 70+ self checkout machines using SIP, and a quite high item turnover. For sure, mysql _needs_ quite a lot of ram, we specifically allocated 16G to the innondb buffer pool (innodb-buffer-pool-size=16G). This is needed since Koha rarely has been optimized for DB queries. Second, plack, apache and sip should have a fair amount of workers. In our example: SIP_WORKERS: 125 APACHE_MINSERVERS: 20 APACHE_TIMEOUT: 1000 PLACK_WORKERS: "30" PLACK_MAX_REQUESTS: "200" Now, on the few troubles we had so far: - when employees ran some troublesome SQL reports which consumed high amounts of memory it could overload, by starting to swap memory to disk and stall the system since there is no limits on what to run from the Reports. We resolved this by limiting the swap space (tmp) on the mysql server (tmpfs: /tmp:rw,size=655360k) Troublesome queries that exceed swap size would now die. Not the perfect solution, but a simple hack. - patron lookups in intra was sometimes overloading system if using mysql specific identifiers, e.g. percentage, '%a'. We resolved this by filtering with javascript. Again not perfect, but a simple hack. We also, even prior to 17.05, had complaints on slow performance and started investigating. The amount of unneccessary DB calls turned out to be absurd, so a perl expert helped us in resolving this my a memoized cached we named purefunctions. It's been discussed in https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=17941 which is a bug fixing some performance issues in CanbookBeRenewed. In short PureFunctions is a cache map to be used on the "immutable" functions called often, usually Getting info on items, patrons, etc. It is a short-lived cache, that is, it is function scope, so it will not live after function is returned. As an example, we ran a call on /svc/checkouts on a patron with 5 checkouts with some items that had many reserves and saw that about 15k db calls were saved and about 20sec. We have been running this stable for more than 6 months in production, so we are sure it is stable. Perhaps this should be added to the community? Our patch is here: https://gitlab.deichman.no/digibib/Koha/commit/919c65f1b521d051bc0dc137efaef... As you can se, we have added it to any Get function that can be deemed safe, and it has improved our Koha performance immensely. To be sure, this is not fixing poorly optimized code, only bypassing it. Regards Benjamin Rokseth Oslo Public Library ________________________________________ Fra: Koha <koha-bounces@lists.katipo.co.nz> på vegne av koha-request@lists.katipo.co.nz <koha-request@lists.katipo.co.nz> Sendt: 12. oktober 2017 16:34 Til: koha@lists.katipo.co.nz Emne: Koha Digest, Vol 144, Issue 18 Send Koha mailing list submissions to koha@lists.katipo.co.nz To subscribe or unsubscribe via the World Wide Web, visit https://lists.katipo.co.nz/mailman/listinfo/koha or, via email, send a message with subject or body 'help' to koha-request@lists.katipo.co.nz You can reach the person managing the list at koha-owner@lists.katipo.co.nz When replying, please edit your Subject line so it is more specific than "Re: Contents of Koha digest..." Today's Topics: 1. Re: Major, multiple problems with our Koha 17.05 system. (Tajoli Zeno) 2. Elasticsearch on BWS Sandboxes (BOUIS Sonia) 3. Re: Major, multiple problems with our Koha 17.05 system. (Tomas Cohen Arazi) 4. Re: Major, multiple problems with our Koha 17.05 system. (Jonathan Druart) ---------------------------------------------------------------------- Message: 1 Date: Thu, 12 Oct 2017 14:29:53 +0200 From: Tajoli Zeno <z.tajoli@cineca.it> To: Raymund Delahunty <r.delahunty@arts.ac.uk>, "koha@lists.katipo.co.nz" <koha@lists.katipo.co.nz> Subject: Re: [Koha] Major, multiple problems with our Koha 17.05 system. Message-ID: <4498a2a9-1f3d-4899-caa0-daf803c2490d@cineca.it> Content-Type: text/plain; charset=utf-8; format=flowed Hi Raymund, Il 12/10/2017 13:29, Raymund Delahunty ha scritto:
VERY long sorry, but our Koha system is virtually unusable and we need help!
generaly speaking, you can try to tune your MySQL setup, I find useful this tool: https://github.com/major/MySQLTuner-perl Attention that could you need more RAM on your server !! About SIP, do a search on bugzilla: https://bugs.koha-community.org/bugzilla3/ and see the open problems Bye Zeno Tajoli -- Zeno Tajoli /SVILUPPO PRODOTTI CINECA/ - Automazione Biblioteche Email: z.tajoli@cineca.it Fax: 051/6132198 *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI) ------------------------------ Message: 2 Date: Thu, 12 Oct 2017 12:29:56 +0000 From: BOUIS Sonia <sonia.bouis@univ-lyon3.fr> To: "koha@lists.katipo.co.nz" <koha@lists.katipo.co.nz> Subject: [Koha] Elasticsearch on BWS Sandboxes Message-ID: <7dcf9a8b89154929861031405cc8dba7@exch2013-mb1.ad.univ-lyon3.fr> Content-Type: text/plain; charset="utf-8" What a good news ! Thank you Nick ! Sonia ------------------------------ Message: 6 Date: Thu, 12 Oct 2017 11:26:29 +0000 From: Nick Clemens <nick@bywatersolutions.com> To: Koha Devel <koha-devel@lists.koha-community.org>, Koha <koha@lists.katipo.co.nz>, ByWater Partners <partners@bywatersolutions.com> Subject: [Koha] Elasticsearch on BWS Sandboxes Message-ID: <CAA_eX3NAO1ktbRn4w00nxKausLBgn4sEGQVAywPuq75NarRhLA@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" Hi All, Just an update to ES testing, we have now made Elasticsearch available on the BWS sandboxes: http://sandbox.bywatersolutions.com/cgi-bin/sandbox-dashboard.pl You will need to set the 'SearchEngine' system preference once the sandbox is setup, but records should be indexed and ready to go. If you have any questions or problems please let me know via email or in IRC. -Nick (kidclamp) ------------------------------ Message: 7 Date: Thu, 12 Oct 2017 11:29:17 +0000 From: Raymund Delahunty <r.delahunty@arts.ac.uk> To: "koha@lists.katipo.co.nz" <koha@lists.katipo.co.nz> Subject: [Koha] Major, multiple problems with our Koha 17.05 system. Message-ID: <HE1PR0601MB2156A4DCC397D0A6FEB95796DC4B0@HE1PR0601MB2156.eurprd06.prod.outlook.com> Content-Type: text/plain; charset="us-ascii" VERY long sorry, but our Koha system is virtually unusable and we need help! We are experiencing dozens of outages daily (both OPAC and intranet, more often only OPAC) with our Koha 17.05.01.000 (Debian 3.2.89-2, perl 5.020002, mysql 14.14, apache 2.4.10). Searches can take 15+ seconds. While university staff provide first tier support, we don't have server access, and the second tier and hosting is via an external company. Of course we are working with that company to have them identify the cause, but I have decided to ask the community of any user on 17.05 has the problems we have. Our system is quite extensively customised, with major work done to the OPAC at https://libsearch.arts.ac.uk<https://libsearch.arts.ac.uk/>. We have some auto-renewal functionality we funded working, while we work on more improvements (we are NOT using the template toolkit, relying on workarounds to offer reasonably useful auto-renewal features). Our problems started in the summer vacation soon after the upgrade, where Koha returns functionality became unstable. It was taking 8 seconds to return one item. Our support company advised that it was the AddReturn (and, in particular, MarkIssueReturned) which was the pinch point. Seems some code to resolve the auto_increment bug works slower in 17.05 and our support company was correcting that with a view (I understand) to adding it to Master. Things improved a little, but with the start of term almost 2 weeks ago we are encountering so many problems... "We were alerted to issues at 4.22pm. and found that the plack server was no longer responding. We restarted plack and then apache and the service restored. We are trying to get to the bottom of what caused this issue. It appears to be completely separate from the high CPU problem that we are also seeing." I understand more resources have been allocated to mySQL and additional plack helpers have been allocated (whatever that means). We are heavy users of SIP (80% of transactions are via self issue). But there are so many problems with the returns unit that they are almost unusable. Front line staff are fed up! The sorters and the kiosks configured for returns (as well as issue) fail repeatedly dozens of times daily (ERR_SIP_COM_RECV). Reconnection does happen- takes about 40seconds or so, leading to queues. And the items have NOT come off the account and flow into the exceptions bin. At the same time the intranet report 500 server errors. I am pretty sure most of our problems are related to returns code, but could well be wrong/ oversimplifying things. Our support company advised: "We have added extra logging and can see that some checkin requests via the SIP2 protocol are causing the sipserver process to abort. The book is returned within Koha but as the process returns no information to the SIP2 client therefore the unit responds with an error. After waiting for the timeout period it then will reconnect to the server. We are adding further debug to the logs to work out why the sipserver process is aborting. It does seem to be a special subset of the checkin requests. I'll update you when we have more news." And more problems relating to total outages (with proxy errors 502 relating to invalid responses from an upstream server): "There appears to be a number of issues you are experiencing. Friday you had high CPU usage which is different to today's issues which we think are down to the apache webserver not restarting correctly. I will double check first thing tomorrow that the overnight apache restart has occurred properly. We are hoping the database connections change may address the high CPU issue. We will monitor closely for the errors we have seen in the apache error logs tomorrow." We are a university with 20,000 students across 6 colleges, and right now inductions and training in OPAC use is going on with new students. These continual problems are deeply embarrassing, and faith in the Koha system is dropping rapidly. Has any other Koha 17.05 user (especially one heavily reliant on SIP) experienced anything like this? Has anyone out there any suggestions? Ray Delahunty University of the Arts London This email and any attachments are intended solely for the addressee and may contain confidential information. If you are not the intended recipient of this email and/or its attachments you must not take any action based upon them and you must not copy or show them to anyone. Please send the email back to us and immediately and permanently delete it and its attachments. Where this email is unrelated to the business of University of the Arts London or of any of its group companies the opinions expressed in it are the opinions of the sender and do not necessarily constitute those of University of the Arts London (or the relevant group company). Where the sender's signature indicates that the email is sent on behalf of UAL Short Courses Limited the following also applies: UAL Short Courses Limited is a company registered in England and Wales under company number 02361261. Registered Office: University of the Arts London, 272 High Holborn, London WC1V 7EY ------------------------------ Message: 8 Date: Thu, 12 Oct 2017 12:25:13 +0000 From: Marcel de Rooy <M.de.Rooy@rijksmuseum.nl> To: "koha@lists.katipo.co.nz" <koha@lists.katipo.co.nz> Subject: Re: [Koha] Major, multiple problems with our Koha 17.05 system. Message-ID: <VI1PR05MB3487F8E520F21EA675CA2957CE4B0@VI1PR05MB3487.eurprd05.prod.outlook.com> Content-Type: text/plain; charset="utf-8" Very sorry to hear about your problems, Raymond. Cant say very much about it, since we do not yet use 17.05. But you should be searching on Bugzilla for the reports that got recently backported to 17.05.x. Might just include some that you need. Upgrading to the latest 17.05 would then perhaps address some issues. Just a side note: Very long delays in the past have been the result of date calculations with very far future dates like 9999 in expiry dates etc. Setting them back temporarily has been helpful in the past. (No guarantees; may be other factors involved.) Since your system is extensively customized, it could be helpful (for you) to know if your system without custom changes in a test environment would still experience the same problems.. And what about: Does your server have enough CPU and memory, etc ? Just a few remarks. Marcel -----Oorspronkelijk bericht----- Van: Koha [mailto:koha-bounces@lists.katipo.co.nz] Namens Raymund Delahunty Verzonden: donderdag 12 oktober 2017 13:29 Aan: koha@lists.katipo.co.nz Onderwerp: [Koha] Major, multiple problems with our Koha 17.05 system. VERY long sorry, but our Koha system is virtually unusable and we need help! We are experiencing dozens of outages daily (both OPAC and intranet, more often only OPAC) with our Koha 17.05.01.000 (Debian 3.2.89-2, perl 5.020002, mysql 14.14, apache 2.4.10). Searches can take 15+ seconds. While university staff provide first tier support, we don't have server access, and the second tier and hosting is via an external company. Of course we are working with that company to have them identify the cause, but I have decided to ask the community of any user on 17.05 has the problems we have. Our system is quite extensively customised, with major work done to the OPAC at https://libsearch.arts.ac.uk<https://libsearch.arts.ac.uk/>. We have some auto-renewal functionality we funded working, while we work on more improvements (we are NOT using the template toolkit, relying on workarounds to offer reasonably useful auto-renewal features). Our problems started in the summer vacation soon after the upgrade, where Koha returns functionality became unstable. It was taking 8 seconds to return one item. Our support company advised that it was the AddReturn (and, in particular, MarkIssueReturned) which was the pinch point. Seems some code to resolve the auto_increment bug works slower in 17.05 and our support company was correcting that with a view (I understand) to adding it to Master. Things improved a little, but with the start of term almost 2 weeks ago we are encountering so many problems... "We were alerted to issues at 4.22pm. and found that the plack server was no longer responding. We restarted plack and then apache and the service restored. We are trying to get to the bottom of what caused this issue. It appears to be completely separate from the high CPU problem that we are also seeing." I understand more resources have been allocated to mySQL and additional plack helpers have been allocated (whatever that means). We are heavy users of SIP (80% of transactions are via self issue). But there are so many problems with the returns unit that they are almost unusable. Front line staff are fed up! The sorters and the kiosks configured for returns (as well as issue) fail repeatedly dozens of times daily (ERR_SIP_COM_RECV). Reconnection does happen- takes about 40seconds or so, leading to queues. And the items have NOT come off the account and flow into the exceptions bin. At the same time the intranet report 500 server errors. I am pretty sure most of our problems are related to returns code, but could well be wrong/ oversimplifying things. Our support company advised: "We have added extra logging and can see that some checkin requests via the SIP2 protocol are causing the sipserver process to abort. The book is returned within Koha but as the process returns no information to the SIP2 client therefore the unit responds with an error. After waiting for the timeout period it then will reconnect to the server. We are adding further debug to the logs to work out why the sipserver process is aborting. It does seem to be a special subset of the checkin requests. I'll update you when we have more news." And more problems relating to total outages (with proxy errors 502 relating to invalid responses from an upstream server): "There appears to be a number of issues you are experiencing. Friday you had high CPU usage which is different to today's issues which we think are down to the apache webserver not restarting correctly. I will double check first thing tomorrow that the overnight apache restart has occurred properly. We are hoping the database connections change may address the high CPU issue. We will monitor closely for the errors we have seen in the apache error logs tomorrow." We are a university with 20,000 students across 6 colleges, and right now inductions and training in OPAC use is going on with new students. These continual problems are deeply embarrassing, and faith in the Koha system is dropping rapidly. Has any other Koha 17.05 user (especially one heavily reliant on SIP) experienced anything like this? Has anyone out there any suggestions? Ray Delahunty University of the Arts London This email and any attachments are intended solely for the addressee and may contain confidential information. If you are not the intended recipient of this email and/or its attachments you must not take any action based upon them and you must not copy or show them to anyone. Please send the email back to us and immediately and permanently delete it and its attachments. Where this email is unrelated to the business of University of the Arts London or of any of its group companies the opinions expressed in it are the opinions of the sender and do not necessarily constitute those of University of the Arts London (or the relevant group company). Where the sender's signature indicates that the email is sent on behalf of UAL Short Courses Limited the following also applies: UAL Short Courses Limited is a company registered in England and Wales under company number 02361261. Registered Office: University of the Arts London, 272 High Holborn, London WC1V 7EY _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha ------------------------------ Subject: Digest Footer _______________________________________________ Koha mailing list Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha ------------------------------ End of Koha Digest, Vol 144, Issue 17 ************************************* ------------------------------ Message: 3 Date: Thu, 12 Oct 2017 14:16:22 +0000 From: Tomas Cohen Arazi <tomascohen@gmail.com> To: koha <koha@lists.katipo.co.nz> Subject: Re: [Koha] Major, multiple problems with our Koha 17.05 system. Message-ID: <CABZfb=XgDSBVrifwYJv5767R+TyrWsd7J-TGa989jXO9BOV6ew@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" El jue., 12 oct. 2017 a las 8:29, Raymund Delahunty (<r.delahunty@arts.ac.uk>) escribió:
VERY long sorry, but our Koha system is virtually unusable and we need help!
We are experiencing dozens of outages daily (both OPAC and intranet, more often only OPAC) with our Koha 17.05.01.000 (Debian 3.2.89-2, perl 5.020002, mysql 14.14, apache 2.4.10). Searches can take 15+ seconds. While university staff provide first tier support, we don't have server access, and the second tier and hosting is via an external company. Of course we are working with that company to have them identify the cause, but I have decided to ask the community of any user on 17.05 has the problems we have.
It seems you haven't enabled Plack on your setup. Those numbers are really wrong. -- Tomás Cohen Arazi Theke Solutions (https://theke.io <http://theke.io/>) ✆ +54 9351 3513384 GPG: B2F3C15F ------------------------------ Message: 4 Date: Thu, 12 Oct 2017 14:34:13 +0000 From: Jonathan Druart <jonathan.druart@bugs.koha-community.org> To: Raymund Delahunty <r.delahunty@arts.ac.uk>, "koha@lists.katipo.co.nz" <koha@lists.katipo.co.nz> Subject: Re: [Koha] Major, multiple problems with our Koha 17.05 system. Message-ID: <CAJzKNY4-1u6LfY_=n+H0DOs-4LN=2Q+EwAUiOdS++cgBYkVKWw@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" I am a bit desperate because I already told several times on this list that some versions of Koha must not be used. See https://wiki.koha-community.org/wiki/DBMS_auto_increment_fix and you will see that 17.05.01 is a buggy version My recommendation is to upgrade ASAP to the latest 17.05.x Maybe it will not change anything, but you should not use this version anyway. Regards, Jonathan On Thu, 12 Oct 2017 at 08:29 Raymund Delahunty <r.delahunty@arts.ac.uk> wrote:
VERY long sorry, but our Koha system is virtually unusable and we need help!
We are experiencing dozens of outages daily (both OPAC and intranet, more often only OPAC) with our Koha 17.05.01.000 (Debian 3.2.89-2, perl 5.020002, mysql 14.14, apache 2.4.10). Searches can take 15+ seconds. While university staff provide first tier support, we don't have server access, and the second tier and hosting is via an external company. Of course we are working with that company to have them identify the cause, but I have decided to ask the community of any user on 17.05 has the problems we have.
Our system is quite extensively customised, with major work done to the OPAC at https://libsearch.arts.ac.uk<https://libsearch.arts.ac.uk/>. We have some auto-renewal functionality we funded working, while we work on more improvements (we are NOT using the template toolkit, relying on workarounds to offer reasonably useful auto-renewal features).
Our problems started in the summer vacation soon after the upgrade, where Koha returns functionality became unstable. It was taking 8 seconds to return one item. Our support company advised that it was the AddReturn (and, in particular, MarkIssueReturned) which was the pinch point. Seems some code to resolve the auto_increment bug works slower in 17.05 and our support company was correcting that with a view (I understand) to adding it to Master.
Things improved a little, but with the start of term almost 2 weeks ago we are encountering so many problems...
"We were alerted to issues at 4.22pm. and found that the plack server was no longer responding. We restarted plack and then apache and the service restored. We are trying to get to the bottom of what caused this issue. It appears to be completely separate from the high CPU problem that we are also seeing."
I understand more resources have been allocated to mySQL and additional plack helpers have been allocated (whatever that means).
We are heavy users of SIP (80% of transactions are via self issue). But there are so many problems with the returns unit that they are almost unusable. Front line staff are fed up! The sorters and the kiosks configured for returns (as well as issue) fail repeatedly dozens of times daily (ERR_SIP_COM_RECV). Reconnection does happen- takes about 40seconds or so, leading to queues. And the items have NOT come off the account and flow into the exceptions bin. At the same time the intranet report 500 server errors. I am pretty sure most of our problems are related to returns code, but could well be wrong/ oversimplifying things.
Our support company advised: "We have added extra logging and can see that some checkin requests via the SIP2 protocol are causing the sipserver process to abort. The book is returned within Koha but as the process returns no information to the SIP2 client therefore the unit responds with an error. After waiting for the timeout period it then will reconnect to the server. We are adding further debug to the logs to work out why the sipserver process is aborting. It does seem to be a special subset of the checkin requests. I'll update you when we have more news."
And more problems relating to total outages (with proxy errors 502 relating to invalid responses from an upstream server):
"There appears to be a number of issues you are experiencing. Friday you had high CPU usage which is different to today's issues which we think are down to the apache webserver not restarting correctly. I will double check first thing tomorrow that the overnight apache restart has occurred properly. We are hoping the database connections change may address the high CPU issue. We will monitor closely for the errors we have seen in the apache error logs tomorrow."
We are a university with 20,000 students across 6 colleges, and right now inductions and training in OPAC use is going on with new students. These continual problems are deeply embarrassing, and faith in the Koha system is dropping rapidly. Has any other Koha 17.05 user (especially one heavily reliant on SIP) experienced anything like this? Has anyone out there any suggestions?
Ray Delahunty University of the Arts London
This email and any attachments are intended solely for the addressee and may contain confidential information. If you are not the intended recipient of this email and/or its attachments you must not take any action based upon them and you must not copy or show them to anyone. Please send the email back to us and immediately and permanently delete it and its attachments. Where this email is unrelated to the business of University of the Arts London or of any of its group companies the opinions expressed in it are the opinions of the sender and do not necessarily constitute those of University of the Arts London (or the relevant group company). Where the sender's signature indicates that the email is sent on behalf of UAL Short Courses Limited the following also applies: UAL Short Courses Limited is a company registered in England and Wales under company number 02361261. Registered Office: University of the Arts London, 272 High Holborn, London WC1V 7EY <https://maps.google.com/?q=272+High+Holborn,+London+WC1V+7EY&entry=gmail&source=g> _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
------------------------------ Subject: Digest Footer _______________________________________________ Koha mailing list Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha ------------------------------ End of Koha Digest, Vol 144, Issue 18 *************************************
participants (1)
-
Benjamin Rokseth