[Koha] n-Tier deployment of Koha

dcook at prosentient.com.au dcook at prosentient.com.au
Mon Aug 1 13:11:28 NZST 2022


Hi Victor,

That's an interesting idea. I haven't played with those, but it looks like there is "MySQL Proxy" and "MaxScale" which can do just that. It could be interesting to run https://hub.docker.com/r/mariadb/maxscale in front of 2 MariaDB containers with koha-testing-docker. I was a bit concerned about eventual consistency, especially with the LAST_INSERT_ID() function, but I think those proxies take that into account. Still there might be other places where we do a write then a read and if the read-only node hasn't been updated problems could arise. It would take more experimenting.

--

There's some work that would go into horizontally scaling things. A person would need to manage "cron" and possibly a number of other Koha services like SIP server, z3950_responder, background_jobs_worker, indexer. Most of those Koha services should be OK to run concurrently, although the indexer probably shouldn't be. I think the Elasticsearch indexer will soon use RabbitMQ, which should remove that issue though. That just leaves "cron". Schedulers are one thing I keep coming back to (across many projects) where you don't want multiple running. (If Koha had its own task scheduler, it would need to run as a single service as well.)

Technically, with cron, you could just disable the Koha crontabs on every app server except for 1. That would be the shortest path forward. You'd probably go the same route with the SIP server and z3950_responder, although technically using a TCP load balancer like Nginx or HA Proxy could probably distribute traffic over multiple nodes for the SIP and Z39.50 protocols since sessions are based off of 1 connection. The background_jobs_worker should be fine (theoretically ideal) to have multiple running, and it should take over from the indexer (for Elasticsearch at least - for Zebra you'd want to have an indexer running on only 1 Koha app node - although there's no reason Zebra couldn't be updated to use RabbitMQ too for managing indexing.). 

If you were using AWS, technically you could replace cron with CloudWatch Events. 

Ultimately, it's up to the implementer on how they want to do things. I think that we could probably package Koha a bit differently to make it easier to do advanced setups, and we're moving in the right direction with the indexer process. I haven't had anyone asking for an advanced horizontally scaled Koha setup, so I haven't verified my ideas in production. 

David Cook
Senior Software Engineer
Prosentient Systems
Suite 7.03
6a Glen St
Milsons Point NSW 2061
Australia

Office: 02 9212 0899
Online: 02 8005 0595

-----Original Message-----
From: Victor Grousset/tuxayo <victor at tuxayo.net> 
Sent: Saturday, 30 July 2022 2:13 AM
To: dcook at prosentient.com.au; koha at lists.katipo.co.nz
Cc: 'Admire Mutsikiwa' <amutsikiwa at uzlib.uz.ac.zw>
Subject: Re: [Koha] n-Tier deployment of Koha

Hi :)

On 22-07-28 02:49, dcook at prosentient.com.au wrote:
> Off the top of my head, technically you could have a load balancer in front of  multiple Koha application servers. You'd want to make sure that the cronjobs are only running on 1 application server though. You also would probably want to use an external Elasticsearch cluster since Zebra would be a bottleneck. In theory, you could use a multi-master MySQL/MariaDB database cluster, but I don't know if anyone has done that (and I'd be concerned about the "eventual consistency" of multi-master setups). If you wanted to a master-replica cluster, you'd need to patch Koha to allow for read-only connections (for things like the Reports module); at the moment, Koha can only use 1 read/write database connection.

Are there MariaDB front end/load balancers than could route the read queries to one of the multiple read only replicas? And the write queries to the main server. That would allow to not change apps.

So in the end the only non-horizontally-scalable thing would be DB writes? (unless multi-master eventual consistency works)

> In the long-term I'd love to make Koha easier to scale horizontally (and ideally deploy as a load-balanced containerized application) but there's work to do and not much demand for it from libraries, so I don't think it's a priority for devs at the moment, unfortunately.

Ideally the largest Koha installations would have contributed back/funded scalability improvement. Maybe they did, I don't know the history. So it's a matter of priority of upstreaming changes while having already a lot to deal with. That might show the largest installations are underfunded and have no margin for upstreaming :(

Cheers,

--
Victor Grousset/tuxayo



More information about the Koha mailing list