[Koha] Major, multiple problems with our Koha 17.05 system.

Benjamin Rokseth benjamin.rokseth at kul.oslo.kommune.no
Fri Oct 13 21:22:26 NZDT 2017


Hi,

just want to share our experience with 17.05 to give you some hints where to go.
We have been using it in production since it was released and for us it has been stable enough.
We are a large public library, 400k users, 19branches, some 70+ self checkout machines using SIP, and a quite high item turnover.
For sure, mysql _needs_ quite a lot of ram, we specifically allocated 16G to the innondb buffer pool (innodb-buffer-pool-size=16G).
This is needed since Koha rarely has been optimized for DB queries.
Second, plack, apache and sip should have a fair amount of workers. In our example:
      SIP_WORKERS: 125
      APACHE_MINSERVERS: 20
      APACHE_TIMEOUT: 1000
      PLACK_WORKERS: "30"
      PLACK_MAX_REQUESTS: "200"

Now, on the few troubles we had so far:
- when employees ran some troublesome SQL reports which consumed high amounts of memory it could overload, by starting to swap memory to disk and stall the system
since there is no limits on what to run from the Reports. We resolved this by limiting the swap space (tmp) on the mysql server (tmpfs: /tmp:rw,size=655360k)
Troublesome queries that exceed swap size would now die. Not the perfect solution, but a simple hack.
- patron lookups in intra was sometimes overloading system if using mysql specific identifiers, e.g. percentage, '%a'. We resolved this by filtering with javascript.
Again not perfect, but a simple hack.

We also, even prior to 17.05, had complaints on slow performance and started investigating. The amount of unneccessary DB calls turned out to be absurd, so a perl expert
helped us in resolving this my a memoized cached we named purefunctions. It's been discussed in https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=17941 which
is a bug fixing some performance issues in CanbookBeRenewed. In short PureFunctions is a cache map to be used on the "immutable" functions called often, usually Getting
info on items, patrons, etc. It is a short-lived cache, that is, it is function scope, so it will not live after function is returned. As an example, we ran a call on /svc/checkouts on
a patron with 5 checkouts with some items that had many reserves and saw that about 15k db calls were saved and about 20sec.

We have been running this stable for more than 6 months in production, so we are sure it is stable. Perhaps this should be added to the community? Our patch is here:
https://gitlab.deichman.no/digibib/Koha/commit/919c65f1b521d051bc0dc137efaef0b62fd8d28b
As you can se, we have added it to any Get function that can be deemed safe, and it has improved our Koha performance immensely.
To be sure, this is not fixing poorly optimized code, only bypassing it.

Regards
Benjamin Rokseth
Oslo Public Library 

________________________________________
Fra: Koha <koha-bounces at lists.katipo.co.nz> på vegne av koha-request at lists.katipo.co.nz <koha-request at lists.katipo.co.nz>
Sendt: 12. oktober 2017 16:34
Til: koha at lists.katipo.co.nz
Emne: Koha Digest, Vol 144, Issue 18

Send Koha mailing list submissions to
        koha at lists.katipo.co.nz

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.katipo.co.nz/mailman/listinfo/koha
or, via email, send a message with subject or body 'help' to
        koha-request at lists.katipo.co.nz

You can reach the person managing the list at
        koha-owner at lists.katipo.co.nz

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Koha digest..."


Today's Topics:

   1. Re: Major, multiple problems with our Koha 17.05 system.
      (Tajoli Zeno)
   2. Elasticsearch on BWS Sandboxes (BOUIS Sonia)
   3. Re: Major, multiple problems with our Koha 17.05 system.
      (Tomas Cohen Arazi)
   4. Re: Major, multiple problems with our Koha 17.05 system.
      (Jonathan Druart)


----------------------------------------------------------------------

Message: 1
Date: Thu, 12 Oct 2017 14:29:53 +0200
From: Tajoli Zeno <z.tajoli at cineca.it>
To: Raymund Delahunty <r.delahunty at arts.ac.uk>,
        "koha at lists.katipo.co.nz" <koha at lists.katipo.co.nz>
Subject: Re: [Koha] Major, multiple problems with our Koha 17.05
        system.
Message-ID: <4498a2a9-1f3d-4899-caa0-daf803c2490d at cineca.it>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi Raymund,

Il 12/10/2017 13:29, Raymund Delahunty ha scritto:
> VERY long sorry, but our Koha system is virtually unusable and we need help!

generaly speaking, you can try to tune your MySQL setup, I find useful
this tool: https://github.com/major/MySQLTuner-perl

Attention that could you need more RAM on your server !!

About SIP, do a search on bugzilla:
https://bugs.koha-community.org/bugzilla3/
and see the open problems

Bye
Zeno Tajoli

--
Zeno Tajoli
/SVILUPPO PRODOTTI CINECA/ - Automazione Biblioteche
Email: z.tajoli at cineca.it Fax: 051/6132198
*CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)


------------------------------

Message: 2
Date: Thu, 12 Oct 2017 12:29:56 +0000
From: BOUIS Sonia <sonia.bouis at univ-lyon3.fr>
To: "koha at lists.katipo.co.nz" <koha at lists.katipo.co.nz>
Subject: [Koha] Elasticsearch on BWS Sandboxes
Message-ID:
        <7dcf9a8b89154929861031405cc8dba7 at exch2013-mb1.ad.univ-lyon3.fr>
Content-Type: text/plain; charset="utf-8"

What a good news !
Thank you Nick !

Sonia

------------------------------

Message: 6
Date: Thu, 12 Oct 2017 11:26:29 +0000
From: Nick Clemens <nick at bywatersolutions.com>
To: Koha Devel <koha-devel at lists.koha-community.org>, Koha
        <koha at lists.katipo.co.nz>,  ByWater Partners
        <partners at bywatersolutions.com>
Subject: [Koha] Elasticsearch on BWS Sandboxes
Message-ID:
        <CAA_eX3NAO1ktbRn4w00nxKausLBgn4sEGQVAywPuq75NarRhLA at mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

Hi All,

Just an update to ES testing, we have now made Elasticsearch available on
the BWS sandboxes:
http://sandbox.bywatersolutions.com/cgi-bin/sandbox-dashboard.pl

You will need to set the 'SearchEngine' system preference once the sandbox
is setup, but records should be indexed and ready to go.

If you have any questions or problems please let me know via email or in
IRC.

-Nick (kidclamp)


------------------------------

Message: 7
Date: Thu, 12 Oct 2017 11:29:17 +0000
From: Raymund Delahunty <r.delahunty at arts.ac.uk>
To: "koha at lists.katipo.co.nz" <koha at lists.katipo.co.nz>
Subject: [Koha] Major, multiple problems with our Koha 17.05 system.
Message-ID:
        <HE1PR0601MB2156A4DCC397D0A6FEB95796DC4B0 at HE1PR0601MB2156.eurprd06.prod.outlook.com>

Content-Type: text/plain; charset="us-ascii"

VERY long sorry, but our Koha system is virtually unusable and we need help!

We are experiencing dozens of outages daily (both OPAC and intranet, more often only OPAC) with our Koha 17.05.01.000 (Debian 3.2.89-2, perl 5.020002, mysql 14.14, apache 2.4.10). Searches can take 15+ seconds. While university staff provide first tier support, we don't have server access, and the second tier and hosting is via an external company. Of course we are working with that company to have them identify the cause, but I have decided to ask the community of any user on 17.05 has the problems we have.

Our system is quite extensively customised, with major work done to the OPAC at https://libsearch.arts.ac.uk<https://libsearch.arts.ac.uk/>. We have some auto-renewal functionality we funded working, while we work on more improvements (we are NOT using the template toolkit, relying on workarounds to offer reasonably useful auto-renewal features).

Our problems started in the summer vacation soon after the upgrade, where Koha returns functionality became unstable. It was taking 8 seconds to return one item. Our support company advised that it was the AddReturn (and, in particular, MarkIssueReturned) which was the pinch point. Seems some code to resolve the auto_increment bug works slower in 17.05 and our support company was correcting that with a view (I understand) to adding it to Master.

Things improved a little, but with the start of term almost 2 weeks ago we are encountering so many problems...

"We were alerted to issues at 4.22pm. and found that the plack server was no longer responding. We restarted plack and then apache and the service restored. We are trying to get to the bottom of what caused this issue. It appears to be completely separate from the high CPU problem that we are also seeing."

I understand more resources have been allocated to mySQL and additional plack helpers have been allocated (whatever that means).

We are heavy users of SIP (80% of transactions are via self issue). But there are so many problems with the returns unit that they are almost unusable. Front line staff are fed up! The sorters and the kiosks configured for returns (as well as issue) fail repeatedly dozens of times daily (ERR_SIP_COM_RECV). Reconnection does happen- takes about 40seconds or so, leading to queues. And the items have NOT come off the account and flow into the exceptions bin. At the same time the intranet report 500 server errors. I am pretty sure most of our problems are related to returns code, but could well be wrong/ oversimplifying things.

Our support company advised:
"We have  added extra logging and can see that some checkin requests via the SIP2 protocol are causing the sipserver process to abort. The book is returned within Koha but as the process returns no information to the SIP2 client therefore the unit responds with an error. After waiting for the timeout period it then will reconnect to the server. We are adding further debug to the logs to work out why the sipserver process is aborting. It does seem to be a special subset of the checkin requests. I'll update you when we have more news."

And more problems relating to total outages (with proxy errors 502 relating to invalid responses from an upstream server):

"There appears to be a  number of issues you are experiencing. Friday you had high CPU usage which is different to today's issues which we think are down to the apache webserver not restarting correctly. I will double check first thing tomorrow that the overnight  apache restart has occurred properly. We are hoping the database connections change may address the high CPU issue. We will monitor closely for the errors we have seen in the apache error logs tomorrow."

We are a university with 20,000 students across 6 colleges, and right now inductions and training in OPAC use is going on with new students. These continual problems are deeply embarrassing, and faith in the Koha system is dropping rapidly. Has any other Koha 17.05 user (especially one heavily reliant on SIP) experienced anything like this? Has anyone out there any suggestions?


Ray Delahunty
University of the Arts London


This email and any attachments are intended solely for the addressee and may contain confidential information. If you are not the intended recipient of this email and/or its attachments you must not take any action based upon them and you must not copy or show them to anyone. Please send the email back to us and immediately and permanently delete it and its attachments. Where this email is unrelated to the business of University of the Arts London or of any of its group companies the opinions expressed in it are the opinions of the sender and do not necessarily constitute those of University of the Arts London (or the relevant group company). Where the sender's signature indicates that the email is sent on behalf of UAL Short Courses Limited the following also applies: UAL Short Courses Limited is a company registered in England and Wales under company number 02361261. Registered Office: University of the Arts London, 272 High Holborn, London WC1V 7EY


------------------------------

Message: 8
Date: Thu, 12 Oct 2017 12:25:13 +0000
From: Marcel de Rooy <M.de.Rooy at rijksmuseum.nl>
To: "koha at lists.katipo.co.nz" <koha at lists.katipo.co.nz>
Subject: Re: [Koha] Major, multiple problems with our Koha 17.05
        system.
Message-ID:
        <VI1PR05MB3487F8E520F21EA675CA2957CE4B0 at VI1PR05MB3487.eurprd05.prod.outlook.com>

Content-Type: text/plain; charset="utf-8"

Very sorry to hear about your problems, Raymond.
Cant say very much about it, since we do not yet use 17.05.
But you should be searching on Bugzilla for the reports that got recently backported to 17.05.x.
Might just include some that you need. Upgrading to the latest 17.05 would then perhaps address some issues.
Just a side note: Very long delays in the past have been the result of date calculations with very far future dates like 9999 in expiry dates etc. Setting them back temporarily has been helpful in the past. (No guarantees; may be other factors involved.)

Since your system is extensively customized, it could be helpful (for you) to know if your system without custom changes in a test environment would still experience the same problems..
And what about: Does your server have enough CPU and memory, etc ?

Just a few remarks.

Marcel

-----Oorspronkelijk bericht-----
Van: Koha [mailto:koha-bounces at lists.katipo.co.nz] Namens Raymund Delahunty
Verzonden: donderdag 12 oktober 2017 13:29
Aan: koha at lists.katipo.co.nz
Onderwerp: [Koha] Major, multiple problems with our Koha 17.05 system.

VERY long sorry, but our Koha system is virtually unusable and we need help!

We are experiencing dozens of outages daily (both OPAC and intranet, more often only OPAC) with our Koha 17.05.01.000 (Debian 3.2.89-2, perl 5.020002, mysql 14.14, apache 2.4.10). Searches can take 15+ seconds. While university staff provide first tier support, we don't have server access, and the second tier and hosting is via an external company. Of course we are working with that company to have them identify the cause, but I have decided to ask the community of any user on 17.05 has the problems we have.

Our system is quite extensively customised, with major work done to the OPAC at https://libsearch.arts.ac.uk<https://libsearch.arts.ac.uk/>. We have some auto-renewal functionality we funded working, while we work on more improvements (we are NOT using the template toolkit, relying on workarounds to offer reasonably useful auto-renewal features).

Our problems started in the summer vacation soon after the upgrade, where Koha returns functionality became unstable. It was taking 8 seconds to return one item. Our support company advised that it was the AddReturn (and, in particular, MarkIssueReturned) which was the pinch point. Seems some code to resolve the auto_increment bug works slower in 17.05 and our support company was correcting that with a view (I understand) to adding it to Master.

Things improved a little, but with the start of term almost 2 weeks ago we are encountering so many problems...

"We were alerted to issues at 4.22pm. and found that the plack server was no longer responding. We restarted plack and then apache and the service restored. We are trying to get to the bottom of what caused this issue. It appears to be completely separate from the high CPU problem that we are also seeing."

I understand more resources have been allocated to mySQL and additional plack helpers have been allocated (whatever that means).

We are heavy users of SIP (80% of transactions are via self issue). But there are so many problems with the returns unit that they are almost unusable. Front line staff are fed up! The sorters and the kiosks configured for returns (as well as issue) fail repeatedly dozens of times daily (ERR_SIP_COM_RECV). Reconnection does happen- takes about 40seconds or so, leading to queues. And the items have NOT come off the account and flow into the exceptions bin. At the same time the intranet report 500 server errors. I am pretty sure most of our problems are related to returns code, but could well be wrong/ oversimplifying things.

Our support company advised:
"We have  added extra logging and can see that some checkin requests via the SIP2 protocol are causing the sipserver process to abort. The book is returned within Koha but as the process returns no information to the SIP2 client therefore the unit responds with an error. After waiting for the timeout period it then will reconnect to the server. We are adding further debug to the logs to work out why the sipserver process is aborting. It does seem to be a special subset of the checkin requests. I'll update you when we have more news."

And more problems relating to total outages (with proxy errors 502 relating to invalid responses from an upstream server):

"There appears to be a  number of issues you are experiencing. Friday you had high CPU usage which is different to today's issues which we think are down to the apache webserver not restarting correctly. I will double check first thing tomorrow that the overnight  apache restart has occurred properly. We are hoping the database connections change may address the high CPU issue. We will monitor closely for the errors we have seen in the apache error logs tomorrow."

We are a university with 20,000 students across 6 colleges, and right now inductions and training in OPAC use is going on with new students. These continual problems are deeply embarrassing, and faith in the Koha system is dropping rapidly. Has any other Koha 17.05 user (especially one heavily reliant on SIP) experienced anything like this? Has anyone out there any suggestions?


Ray Delahunty
University of the Arts London


This email and any attachments are intended solely for the addressee and may contain confidential information. If you are not the intended recipient of this email and/or its attachments you must not take any action based upon them and you must not copy or show them to anyone. Please send the email back to us and immediately and permanently delete it and its attachments. Where this email is unrelated to the business of University of the Arts London or of any of its group companies the opinions expressed in it are the opinions of the sender and do not necessarily constitute those of University of the Arts London (or the relevant group company). Where the sender's signature indicates that the email is sent on behalf of UAL Short Courses Limited the following also applies: UAL Short Courses Limited is a company registered in England and Wales under company number 02361261. Registered Office: University of the Arts London, 272 High Holborn, London WC1V 7EY _______________________________________________
Koha mailing list  http://koha-community.org Koha at lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha

------------------------------

Subject: Digest Footer

_______________________________________________
Koha mailing list
Koha at lists.katipo.co.nz
https://lists.katipo.co.nz/mailman/listinfo/koha


------------------------------

End of Koha Digest, Vol 144, Issue 17
*************************************

------------------------------

Message: 3
Date: Thu, 12 Oct 2017 14:16:22 +0000
From: Tomas Cohen Arazi <tomascohen at gmail.com>
To: koha <koha at lists.katipo.co.nz>
Subject: Re: [Koha] Major, multiple problems with our Koha 17.05
        system.
Message-ID:
        <CABZfb=XgDSBVrifwYJv5767R+TyrWsd7J-TGa989jXO9BOV6ew at mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

El jue., 12 oct. 2017 a las 8:29, Raymund Delahunty (<r.delahunty at arts.ac.uk>)
escribió:

> VERY long sorry, but our Koha system is virtually unusable and we need
> help!
>
> We are experiencing dozens of outages daily (both OPAC and intranet, more
> often only OPAC) with our Koha 17.05.01.000 (Debian 3.2.89-2, perl
> 5.020002, mysql 14.14, apache 2.4.10). Searches can take 15+ seconds. While
> university staff provide first tier support, we don't have server access,
> and the second tier and hosting is via an external company. Of course we
> are working with that company to have them identify the cause, but I have
> decided to ask the community of any user on 17.05 has the problems we have.


It seems you haven't enabled Plack on your setup.
Those numbers are really wrong.
--
Tomás Cohen Arazi
Theke Solutions (https://theke.io <http://theke.io/>)
✆ +54 9351 3513384
GPG: B2F3C15F


------------------------------

Message: 4
Date: Thu, 12 Oct 2017 14:34:13 +0000
From: Jonathan Druart <jonathan.druart at bugs.koha-community.org>
To: Raymund Delahunty <r.delahunty at arts.ac.uk>,
        "koha at lists.katipo.co.nz" <koha at lists.katipo.co.nz>
Subject: Re: [Koha] Major, multiple problems with our Koha 17.05
        system.
Message-ID:
        <CAJzKNY4-1u6LfY_=n+H0DOs-4LN=2Q+EwAUiOdS++cgBYkVKWw at mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

I am a bit desperate because I already told several times on this list that
some versions of Koha must not be used.

See https://wiki.koha-community.org/wiki/DBMS_auto_increment_fix
and you will see that 17.05.01 is a buggy version

My recommendation is to upgrade ASAP to the latest 17.05.x

Maybe it will not change anything, but you should not use this version
anyway.

Regards,
Jonathan

On Thu, 12 Oct 2017 at 08:29 Raymund Delahunty <r.delahunty at arts.ac.uk>
wrote:

> VERY long sorry, but our Koha system is virtually unusable and we need
> help!
>
> We are experiencing dozens of outages daily (both OPAC and intranet, more
> often only OPAC) with our Koha 17.05.01.000 (Debian 3.2.89-2, perl
> 5.020002, mysql 14.14, apache 2.4.10). Searches can take 15+ seconds. While
> university staff provide first tier support, we don't have server access,
> and the second tier and hosting is via an external company. Of course we
> are working with that company to have them identify the cause, but I have
> decided to ask the community of any user on 17.05 has the problems we have.
>
> Our system is quite extensively customised, with major work done to the
> OPAC at https://libsearch.arts.ac.uk<https://libsearch.arts.ac.uk/>. We
> have some auto-renewal functionality we funded working, while we work on
> more improvements (we are NOT using the template toolkit, relying on
> workarounds to offer reasonably useful auto-renewal features).
>
> Our problems started in the summer vacation soon after the upgrade, where
> Koha returns functionality became unstable. It was taking 8 seconds to
> return one item. Our support company advised that it was the AddReturn
> (and, in particular, MarkIssueReturned) which was the pinch point. Seems
> some code to resolve the auto_increment bug works slower in 17.05 and our
> support company was correcting that with a view (I understand) to adding it
> to Master.
>
> Things improved a little, but with the start of term almost 2 weeks ago we
> are encountering so many problems...
>
> "We were alerted to issues at 4.22pm. and found that the plack server was
> no longer responding. We restarted plack and then apache and the service
> restored. We are trying to get to the bottom of what caused this issue. It
> appears to be completely separate from the high CPU problem that we are
> also seeing."
>
> I understand more resources have been allocated to mySQL and additional
> plack helpers have been allocated (whatever that means).
>
> We are heavy users of SIP (80% of transactions are via self issue). But
> there are so many problems with the returns unit that they are almost
> unusable. Front line staff are fed up! The sorters and the kiosks
> configured for returns (as well as issue) fail repeatedly dozens of times
> daily (ERR_SIP_COM_RECV). Reconnection does happen- takes about 40seconds
> or so, leading to queues. And the items have NOT come off the account and
> flow into the exceptions bin. At the same time the intranet report 500
> server errors. I am pretty sure most of our problems are related to returns
> code, but could well be wrong/ oversimplifying things.
>
> Our support company advised:
> "We have  added extra logging and can see that some checkin requests via
> the SIP2 protocol are causing the sipserver process to abort. The book is
> returned within Koha but as the process returns no information to the SIP2
> client therefore the unit responds with an error. After waiting for the
> timeout period it then will reconnect to the server. We are adding further
> debug to the logs to work out why the sipserver process is aborting. It
> does seem to be a special subset of the checkin requests. I'll update you
> when we have more news."
>
> And more problems relating to total outages (with proxy errors 502
> relating to invalid responses from an upstream server):
>
> "There appears to be a  number of issues you are experiencing. Friday you
> had high CPU usage which is different to today's issues which we think are
> down to the apache webserver not restarting correctly. I will double check
> first thing tomorrow that the overnight  apache restart has occurred
> properly. We are hoping the database connections change may address the
> high CPU issue. We will monitor closely for the errors we have seen in the
> apache error logs tomorrow."
>
> We are a university with 20,000 students across 6 colleges, and right now
> inductions and training in OPAC use is going on with new students. These
> continual problems are deeply embarrassing, and faith in the Koha system is
> dropping rapidly. Has any other Koha 17.05 user (especially one heavily
> reliant on SIP) experienced anything like this? Has anyone out there any
> suggestions?
>
>
> Ray Delahunty
> University of the Arts London
>
>
> This email and any attachments are intended solely for the addressee and
> may contain confidential information. If you are not the intended recipient
> of this email and/or its attachments you must not take any action based
> upon them and you must not copy or show them to anyone. Please send the
> email back to us and immediately and permanently delete it and its
> attachments. Where this email is unrelated to the business of University of
> the Arts London or of any of its group companies the opinions expressed in
> it are the opinions of the sender and do not necessarily constitute those
> of University of the Arts London (or the relevant group company). Where the
> sender's signature indicates that the email is sent on behalf of UAL Short
> Courses Limited the following also applies: UAL Short Courses Limited is a
> company registered in England and Wales under company number 02361261.
> Registered Office: University of the Arts London, 272 High Holborn,
> London WC1V 7EY
> <https://maps.google.com/?q=272+High+Holborn,+London+WC1V+7EY&entry=gmail&source=g>
> _______________________________________________
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> https://lists.katipo.co.nz/mailman/listinfo/koha
>


------------------------------

Subject: Digest Footer

_______________________________________________
Koha mailing list
Koha at lists.katipo.co.nz
https://lists.katipo.co.nz/mailman/listinfo/koha


------------------------------

End of Koha Digest, Vol 144, Issue 18
*************************************


More information about the Koha mailing list