Re: [Koha] n-Tier deployment of Koha
Hi Admire, You'll probably have more luck asking on the dev listserv: koha-devel@lists.koha-community.org. However, I'll try to offer some insight here. I'm also CCing in Mengu who has probably the biggest horizontally scaled out Koha in the world. Off the top of my head, technically you could have a load balancer in front of multiple Koha application servers. You'd want to make sure that the cronjobs are only running on 1 application server though. You also would probably want to use an external Elasticsearch cluster since Zebra would be a bottleneck. In theory, you could use a multi-master MySQL/MariaDB database cluster, but I don't know if anyone has done that (and I'd be concerned about the "eventual consistency" of multi-master setups). If you wanted to a master-replica cluster, you'd need to patch Koha to allow for read-only connections (for things like the Reports module); at the moment, Koha can only use 1 read/write database connection. You'd also need to move your RabbitMQ server to an external server which is problematic with the koha-common package at the moment (but there are ways). (Note: I may be missing another horizontal scaling issue, but these are what come to mind at the moment. You might need to use a network share for some file system resources in /var/koha/* but I can't recall off the top of my head.) It would be much easier to scale vertically. I'd suggest you do some analysis to see where your key pain points are. - Out of the box, Koha only uses 2 Starman/Plack workers, so if it's a case where your Koha is overwhelmed by legitimate traffic, I'd recommend adding more Starman/Plack workers via koha-conf.xml (if you have the CPU and RAM to spare). - If your Koha is slow, you might want to look at your database's I/O performance as well. Using quality SSDs can make a huge difference for database driven applications. (Note also that having MySQL/MariaDB on an external server is a common practice but technically speaking it does add some latency to database transactions which you wouldn't have if you ran MySQL/MariaDB on the same server and used Unix sockets. For very large Koha databases, we've noticed that having an external DB server adds too much latency.) - If you're getting a lot of traffic, you should check to see how much traffic is from bots. You might need to implement something like fail2ban to block bots that put too much load on your system. Typically, I find when Koha is overwhelmed, it's due to bots. (You can experiment with a robots.txt file to help with this but it's not a completely effective solution.) There's lots of things to explore. In the long-term I'd love to make Koha easier to scale horizontally (and ideally deploy as a load-balanced containerized application) but there's work to do and not much demand for it from libraries, so I don't think it's a priority for devs at the moment, unfortunately. Anyway, I hope that helps! David Cook Senior Software Engineer Prosentient Systems Suite 7.03 6a Glen St Milsons Point NSW 2061 Australia Office: 02 9212 0899 Online: 02 8005 0595 -----Original Message----- Message: 10 Date: Wed, 27 Jul 2022 16:28:12 +0100 From: Admire Mutsikiwa <amutsikiwa@uzlib.uz.ac.zw> To: koha@lists.katipo.co.nz Subject: [Koha] n-Tier deployment of Koha Message-ID: <CAD4nctbymX-UfziZr67h-4p4O0GBPapecRCXfc=obHtmiFt0VQ@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" My Koha implementation is sometimes overwhelmed as it is a two-tier implementation with a database on one server and a Koha application on another server. I am wondering if it is possible to have a 3 or 4 tier deployment architecture, with for instance, having a load balancer, koha application server and a clustered MySQL/Mariadb implementation. Kind Regards, -- *The information transmitted by this email is intended only for the person or entity to which it is addressed. This email may contain proprietary, business-confidential, and/or privileged material. If you are not the intended recipient of this message, be aware that any use, review, retransmission, distribution, reproduction or any action taken in reliance upon this message is strictly prohibited. If you received this in error, please contact the sender and delete the material from all computers*
In my experience, you're usually MUCH better off adding more resources to your single database server. For any given server purchase, there is a cost per performance breaking point where it starts to make more sense to buy two (or more) smaller servers for the same performance vs one large server. It's been a couple years, but the last time I checked the breaking point was something like 64GB RAM and 2 16-core CPUs. You can serve a lot of books to a lot of patrons on a machine like that. Joel Coehoorn Director of Information Technology York University of Nebraska On Wed, Jul 27, 2022 at 7:49 PM <dcook@prosentient.com.au> wrote:
Hi Admire,
You'll probably have more luck asking on the dev listserv: koha-devel@lists.koha-community.org. However, I'll try to offer some insight here. I'm also CCing in Mengu who has probably the biggest horizontally scaled out Koha in the world.
Off the top of my head, technically you could have a load balancer in front of multiple Koha application servers. You'd want to make sure that the cronjobs are only running on 1 application server though. You also would probably want to use an external Elasticsearch cluster since Zebra would be a bottleneck. In theory, you could use a multi-master MySQL/MariaDB database cluster, but I don't know if anyone has done that (and I'd be concerned about the "eventual consistency" of multi-master setups). If you wanted to a master-replica cluster, you'd need to patch Koha to allow for read-only connections (for things like the Reports module); at the moment, Koha can only use 1 read/write database connection. You'd also need to move your RabbitMQ server to an external server which is problematic with the koha-common package at the moment (but there are ways). (Note: I may be missing another horizontal scaling issue, but these are what come to mind at the moment. You might need to use a network share for some file system resources in /var/koha/* but I can't recall off the top of my head.)
It would be much easier to scale vertically. I'd suggest you do some analysis to see where your key pain points are. - Out of the box, Koha only uses 2 Starman/Plack workers, so if it's a case where your Koha is overwhelmed by legitimate traffic, I'd recommend adding more Starman/Plack workers via koha-conf.xml (if you have the CPU and RAM to spare). - If your Koha is slow, you might want to look at your database's I/O performance as well. Using quality SSDs can make a huge difference for database driven applications. (Note also that having MySQL/MariaDB on an external server is a common practice but technically speaking it does add some latency to database transactions which you wouldn't have if you ran MySQL/MariaDB on the same server and used Unix sockets. For very large Koha databases, we've noticed that having an external DB server adds too much latency.) - If you're getting a lot of traffic, you should check to see how much traffic is from bots. You might need to implement something like fail2ban to block bots that put too much load on your system. Typically, I find when Koha is overwhelmed, it's due to bots. (You can experiment with a robots.txt file to help with this but it's not a completely effective solution.)
There's lots of things to explore.
In the long-term I'd love to make Koha easier to scale horizontally (and ideally deploy as a load-balanced containerized application) but there's work to do and not much demand for it from libraries, so I don't think it's a priority for devs at the moment, unfortunately.
Anyway, I hope that helps!
David Cook Senior Software Engineer Prosentient Systems Suite 7.03 6a Glen St Milsons Point NSW 2061 Australia
Office: 02 9212 0899 Online: 02 8005 0595
-----Original Message----- Message: 10 Date: Wed, 27 Jul 2022 16:28:12 +0100 From: Admire Mutsikiwa <amutsikiwa@uzlib.uz.ac.zw> To: koha@lists.katipo.co.nz Subject: [Koha] n-Tier deployment of Koha Message-ID: <CAD4nctbymX-UfziZr67h-4p4O0GBPapecRCXfc= obHtmiFt0VQ@mail.gmail.com> Content-Type: text/plain; charset="UTF-8"
My Koha implementation is sometimes overwhelmed as it is a two-tier implementation with a database on one server and a Koha application on another server. I am wondering if it is possible to have a 3 or 4 tier deployment architecture, with for instance, having a load balancer, koha application server and a clustered MySQL/Mariadb implementation.
Kind Regards,
--
*The information transmitted by this email is intended only for the person or entity to which it is addressed. This email may contain proprietary, business-confidential, and/or privileged material. If you are not the intended recipient of this message, be aware that any use, review, retransmission, distribution, reproduction or any action taken in reliance upon this message is strictly prohibited. If you received this in error, please contact the sender and delete the material from all computers*
_______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Hi :) On 22-07-28 02:49, dcook@prosentient.com.au wrote:
Off the top of my head, technically you could have a load balancer in front of multiple Koha application servers. You'd want to make sure that the cronjobs are only running on 1 application server though. You also would probably want to use an external Elasticsearch cluster since Zebra would be a bottleneck. In theory, you could use a multi-master MySQL/MariaDB database cluster, but I don't know if anyone has done that (and I'd be concerned about the "eventual consistency" of multi-master setups). If you wanted to a master-replica cluster, you'd need to patch Koha to allow for read-only connections (for things like the Reports module); at the moment, Koha can only use 1 read/write database connection.
Are there MariaDB front end/load balancers than could route the read queries to one of the multiple read only replicas? And the write queries to the main server. That would allow to not change apps. So in the end the only non-horizontally-scalable thing would be DB writes? (unless multi-master eventual consistency works)
In the long-term I'd love to make Koha easier to scale horizontally (and ideally deploy as a load-balanced containerized application) but there's work to do and not much demand for it from libraries, so I don't think it's a priority for devs at the moment, unfortunately.
Ideally the largest Koha installations would have contributed back/funded scalability improvement. Maybe they did, I don't know the history. So it's a matter of priority of upstreaming changes while having already a lot to deal with. That might show the largest installations are underfunded and have no margin for upstreaming :( Cheers, -- Victor Grousset/tuxayo
Hi Victor, That's an interesting idea. I haven't played with those, but it looks like there is "MySQL Proxy" and "MaxScale" which can do just that. It could be interesting to run https://hub.docker.com/r/mariadb/maxscale in front of 2 MariaDB containers with koha-testing-docker. I was a bit concerned about eventual consistency, especially with the LAST_INSERT_ID() function, but I think those proxies take that into account. Still there might be other places where we do a write then a read and if the read-only node hasn't been updated problems could arise. It would take more experimenting. -- There's some work that would go into horizontally scaling things. A person would need to manage "cron" and possibly a number of other Koha services like SIP server, z3950_responder, background_jobs_worker, indexer. Most of those Koha services should be OK to run concurrently, although the indexer probably shouldn't be. I think the Elasticsearch indexer will soon use RabbitMQ, which should remove that issue though. That just leaves "cron". Schedulers are one thing I keep coming back to (across many projects) where you don't want multiple running. (If Koha had its own task scheduler, it would need to run as a single service as well.) Technically, with cron, you could just disable the Koha crontabs on every app server except for 1. That would be the shortest path forward. You'd probably go the same route with the SIP server and z3950_responder, although technically using a TCP load balancer like Nginx or HA Proxy could probably distribute traffic over multiple nodes for the SIP and Z39.50 protocols since sessions are based off of 1 connection. The background_jobs_worker should be fine (theoretically ideal) to have multiple running, and it should take over from the indexer (for Elasticsearch at least - for Zebra you'd want to have an indexer running on only 1 Koha app node - although there's no reason Zebra couldn't be updated to use RabbitMQ too for managing indexing.). If you were using AWS, technically you could replace cron with CloudWatch Events. Ultimately, it's up to the implementer on how they want to do things. I think that we could probably package Koha a bit differently to make it easier to do advanced setups, and we're moving in the right direction with the indexer process. I haven't had anyone asking for an advanced horizontally scaled Koha setup, so I haven't verified my ideas in production. David Cook Senior Software Engineer Prosentient Systems Suite 7.03 6a Glen St Milsons Point NSW 2061 Australia Office: 02 9212 0899 Online: 02 8005 0595 -----Original Message----- From: Victor Grousset/tuxayo <victor@tuxayo.net> Sent: Saturday, 30 July 2022 2:13 AM To: dcook@prosentient.com.au; koha@lists.katipo.co.nz Cc: 'Admire Mutsikiwa' <amutsikiwa@uzlib.uz.ac.zw> Subject: Re: [Koha] n-Tier deployment of Koha Hi :) On 22-07-28 02:49, dcook@prosentient.com.au wrote:
Off the top of my head, technically you could have a load balancer in front of multiple Koha application servers. You'd want to make sure that the cronjobs are only running on 1 application server though. You also would probably want to use an external Elasticsearch cluster since Zebra would be a bottleneck. In theory, you could use a multi-master MySQL/MariaDB database cluster, but I don't know if anyone has done that (and I'd be concerned about the "eventual consistency" of multi-master setups). If you wanted to a master-replica cluster, you'd need to patch Koha to allow for read-only connections (for things like the Reports module); at the moment, Koha can only use 1 read/write database connection.
Are there MariaDB front end/load balancers than could route the read queries to one of the multiple read only replicas? And the write queries to the main server. That would allow to not change apps. So in the end the only non-horizontally-scalable thing would be DB writes? (unless multi-master eventual consistency works)
In the long-term I'd love to make Koha easier to scale horizontally (and ideally deploy as a load-balanced containerized application) but there's work to do and not much demand for it from libraries, so I don't think it's a priority for devs at the moment, unfortunately.
Ideally the largest Koha installations would have contributed back/funded scalability improvement. Maybe they did, I don't know the history. So it's a matter of priority of upstreaming changes while having already a lot to deal with. That might show the largest installations are underfunded and have no margin for upstreaming :( Cheers, -- Victor Grousset/tuxayo
Hi Admire David Cook has written:
- If you're getting a lot of traffic, you should check to see how much traffic is from bots. You might need to implement something like fail2ban to block bots that put too much load on your system. Typically, I find when Koha is overwhelmed, it's due to bots. (You can experiment with a robots.txt file to help with this but it's not a completely effective solution.)
To limit performance loss caused by bots it is also a good idea to create a Koha sitemap, see https://koha-community.org/manual/21.11/en/html/cron_jobs.html#sitemap Script "koha-sitemap" lets you manage sitemaps for your Koha instances (to avoid annoying bots eating up your CPU). https://wiki.koha-community.org/wiki/Commands_provided_by_the_Debian_package... Best wishes: Michael -- Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz T 0041 (0)61 261 55 61 · E mik@adminkuhn.ch · W www.adminkuhn.ch
participants (4)
-
Coehoorn, Joel -
dcook@prosentient.com.au -
Michael Kuhn -
Victor Grousset/tuxayo