[Koha] OAI-PMH harvester
BOUIS Sonia
sonia.bouis at univ-lyon3.fr
Wed Nov 23 03:12:44 NZDT 2022
Hi,
Thanks to David, Tomas, Michal and Michael for your replies.
So we have decided to evaluate several external OAI-PMH client that could be used by Koha and to choose one in the end of January
There a lot to do after that and we discussed about the background jobs and cronjobs seems to be appropriate. We thought that the settings in the koha intranet should be only to define URLs, SETs, or XSLT sheets (for example, to transform DC XML in MARCXML).
We are only at the begining of the process đ
Kind regards,
Sonia
------------------------------
Message: 2
Date: Wed, 26 Oct 2022 10:37:49 +1100
From: "David Cook" <dcook at prosentient.com.au>
To: "'Tomas Cohen Arazi'" <tomascohen at gmail.com>, "'BOUIS Sonia'"
<sonia.bouis at univ-lyon3.fr>
Cc: "'koha'" <koha at lists.katipo.co.nz>, "'koha-devel'"
<koha-devel at lists.koha-community.org>
Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester
Message-ID: <07af01d8e8ca$dfbddef0$9f399cd0$@prosentient.com.au>
Content-Type: text/plain; charset="utf-8"
Hi Sonia,
Iâm excited to hear that KohaLA would like to finance an OAI-PMH client in Koha! This functionality is always brewing in the back of my mind, since I first raised 10662 back in 2013.
As Tomas says, I think that the background jobs are a key component for processing incoming OAI-PMH records.
However, the ***missing component right now is the scheduling of the OAI-PMH harvesting tasks***, and I think this is where opinions get divided. Below, Iâll provide some history and opinions on Koha OAI-PMH.
--
With 10662, the sponsored goal was for Koha library staff to schedule OAI-PMH harvests through the Web UI. However, Fridolin from BibLibre raised a point with me at Kohacon18 about how letting library staff control the timing of harvesting tasks could be a problem for support vendors. If too many libraries using the same public IP address tried to harvest from the same OAI-PMH repository, they could be rate limited or blocked. There could also be server load concerns. So there probably needs to be a balance between user configuration and system configuration. If I recall correctly, this is how DSpaceâs OAI-PMH harvester works. Users set up targets and can start/stop harvests, but things like frequency and concurrency are handled by the system configuration.
Based on my experience working on OAI-PMH on and off for nearly 10 years and as a Koha support vendor, I think my preference would be for sysadmins to handle most of the OAI-PMH harvesting details.
The sponsorship for 10662 had certain requirements that many other libraries might not have, which is what made me think that it might be better to have an external client that connects to Koha. I thought maybe I could get the ordinary requirements pushed into Koha, and then handle extraordinary requirements externally. However, an external harvester wonât perform as fast as an internal harvester. (The compromise would be to write the harvester in such a way that people could provide different OAI-PMH harvester Perl modules that all stage records using the same core Koha modules.)
Even then⌠the scheduling would depend on a libraryâs needs. Back in 2013, I had a Koha OAI-PMH harvester which worked as a cronjob. It would run each night. However, some libraries want to run OAI-PMH harvests as frequently as every 3 seconds. A cronjobâs smallest frequency is 60 seconds, so that wouldnât work for that requirement.
If a cronjob isnât suitable, then I think youâd need a daemon created by a new command like âkoha-oai --start <instance_name>â. It could read a configuration file and handle scheduling accordingly. With 10662, I used the POE module, because I knew it well and it has some timer tools for scheduling tasks. If I were to work on it again, Iâd probably use Mojo::IOLoop instead these days, since Mojolicious is already part of Koha while POE is not. (That said, using modules like Mojo and POE are difficult, because theyâre difficult to test using automation. That was one of the stumbling blocks with 10662. While the 10662 harvester worked very well, it was difficult to unit test. In hindsight, I shouldâve written it in a way that was easier to unit test, but it had a lot of event-driven code which made things more difficult.)
Another option would be to create a generic daemon for task scheduling in general (e.g. âkoha-scheduleâ). Koha could use this for many things, but itâs a project in itself.
--
The process of downloading OAI-PMH records and importing MARCXML into Koha is actually a fairly straightforward process. The difficulty is the task scheduling and management of tasks (and unit testing).
I donât know the answer that will make everyone happy. Thereâs lots of different ways of managing and scheduling the tasks. Based on my experience, Iâd suggest targeting the simplest approach first, because complexity will make it less likely for the project to succeed.
On that note, Iâd be happy to test/QA any OAI-PMH harvester put forward. When I was writing OAI-PMH harvester patches, I found it really hard to get QA, so Iâm happy to be that resource for someone else. Iâve spent a lot of time thinking about this topic, so happy to provide advice, warnings, emotional support đ.
David Cook
Senior Software Engineer
Prosentient Systems
Suite 7.03
6a Glen St
Milsons Point NSW 2061
Australia
Office: 02 9212 0899
Online: 02 8005 0595
From: Koha-devel <koha-devel-bounces at lists.koha-community.org> On Behalf Of Tomas Cohen Arazi
Sent: Wednesday, 26 October 2022 3:46 AM
To: BOUIS Sonia <sonia.bouis at univ-lyon3.fr>
Cc: koha <koha at lists.katipo.co.nz>; koha-devel <koha-devel at lists.koha-community.org>
Subject: Re: [Koha-devel] [Koha] OAI-PMH harvester
I think with background jobs we have most of the framework that is needed to deal with this within Koha.
Best regards
El mar, 25 oct 2022 7:08, BOUIS Sonia <sonia.bouis at univ-lyon3.fr <mailto:sonia.bouis at univ-lyon3.fr> > escribiĂł:
Hi,
KohaLA would like to finance an OAI-PMH client in Koha but, we have questions that we want to raise to the community.
There was already tries to propose an OAI-PMH client :
- https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662 : it's an old project that doesnt seem compatible with the current version of Koha
- https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=25905 : the scope is more to use an external OAI-PMH client and to connect it to Koha
Our main question is about the way to handle this. Do you think that it's a better idea to use an external software or PERL routine and to find a way to connect it to Koha. Or would it be better to a new module in Koha from scratch and that Koha have his own OAI-PMH client.
Please, let us hear your toughts about this projet.
Kind regards
Sonia
Sonia BOUIS
------------------------------------------------------
Responsable du Service informatique documentaire DĂŠpartement d'Appui Ă la Recherche et aux Projets (DARP) Bibliothèques universitaires UniversitĂŠ Jean Moulin Lyon 3 ADRESSE GĂOGRAPHIQUE > Manufacture des Tabacs | 6 cours Albert Thomas | LYON 8e ADRESSE POSTALE > Bibliothèque de la Manufacture | 1C avenue des Frères Lumière | CS 78242 - 69372 LYON CEDEX 08
Ligne directe : 33 (0)4 78 78 79 03
http://bu.univ-lyon3.fr<http://bu.univ-lyon3.fr/>| Suivez-nous > Facebook<https://www.facebook.com/bulyon3/> | Twitter<https://twitter.com/bulyon3>| Instagram<https://www.instagram.com/bu.lyon3/?hl=fr>
_______________________________________________
Koha mailing list http://koha-community.org Koha at lists.katipo.co.nz <mailto:Koha at lists.katipo.co.nz>
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.koha-community.org/pipermail/koha-devel/attachments/20221026/d7712779/attachment-0001.htm>
------------------------------
Subject: Digest Footer
_______________________________________________
Koha-devel mailing list
Koha-devel at lists.koha-community.org
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-devel
website : https://www.koha-community.org/ git : https://git.koha-community.org/ bugs : https://bugs.koha-community.org/
------------------------------
End of Koha-devel Digest, Vol 203, Issue 15
*******************************************
More information about the Koha
mailing list