[Koha] Koha and OAI-PMH [and digital materials]

Coehoorn, Joel jcoehoorn at york.edu
Tue Feb 2 04:53:53 NZDT 2016


I'm re-posting your message paragraph by paragraph, with my responses
interspersed.

I originally started working on an OAI-PMH harvester to create an automatic
import of records from DSpace into Koha, so that Koha could show electronic
resources in the catalogue. While having good item types, frameworks, and
templates are good, I found the main problem to be the metadata quality.
Dublin Core doesn’t translate very well to MARC. Although now that I’m
re-reading your email… are you talking about making MARC records in Koha
and then using DSpace’s OAI-PMH harvester to import metadata from Koha?
Then attaching digital materials in DSpace while using Koha to maintain the
metadata? That could be interesting…

At that point I was mainly trying to understand the original message and
working through various possibilities, but yes, that could be interesting.
I think librarians are generally more familiar with circulation software
and formats like Koha/Marc than with DSpace, so it could help with data
quality: all metadata is always entered in Koha, whether physical or
digital, and therefore you only need to learn one system.

I agree that Koha needs to evolve to have better support for digital
materials. However, I’m not sure what you mean by “a first-class framework
for URL-only item types” though. Could you elaborate on that?

At this point I need to mention that I'm the college tech guy. Koha is only
one of several systems I support, and my involvement since our initial
migration is limited. So some of this is based on limited exposure to the
software.

What I mean by "first class framework" is that it's on the same level as
traditional books. My understanding is that, technically, this does happen
already. From Koha's perspective, a biblio framework is a biblio framework
is a biblio framework, and none is more important than any others. What
I've seen so far from a practical standpoint is that librarians will
sometimes try to use the biblio framework created for books as a
one-size-fits all solution for their other item types. Koha could do more
to make it easier to really use multiple kinds of biblio frameworks, and
specifically push a default framework for digital content to the fore in
certain circumstances... even just some warning messages if there are marc
fields present that indicate this might be a digital work with a
non-digital biblio framework would be a step in the right direction.
Eventually the warning messages might also include a short menu of
one-click fixes.

Better integration into systems like DSpace may be achieved using their
evolving REST APIs, although again I’m not sure what you mean by “a user
might end up checking-out some digital content stored there [DSpace? Koha?]
(a record is created in the issues table) that doesn’t need to check back
in”. Why would you need to check out digital content in DSpace? In my
experience, content is either open to the public or limited to logged in
users (as part of licensing/rights restrictions). Are you thinking of
something more like hosting your own ebooks in DSpace and complying with
concurrent user restrictions? That could be interesting as well, although I
think the majority of that work would be done in DSpace with Koha just
hooking into a DSpace API to determine if there are any “available” copies
for the material that they found in the Koha catalogue. I admit I’m not
super familiar with ebook DRM either, so I’m not certain how much
intervention would be required for expiring “on loan” DRM protected library
ebooks.

The idea behind checking out content from DSpace via Koha is for reporting,
so that use for content hosted in DSpace or similar could show up in
existing circulation reports run from Koha. Eventually, this would also
need to have support for DRM, so checking out a protected digital item from
Koha would also send the appropriate information back to DSpace. The idea
is that end users should be able to check out and retrieve digital content
stored elsewhere without ever leaving the Koha opac.

Personally, I find most of our Koha client libraries are interested in
allowing their users to find “only electronic resources” and to provide
seamless access to third-party sites like EBSCO, Proquest, and other
publishers that host the actual content. EBSCO have been putting work into
that with the Koha-EDS plugin, although I’d like to see more done in that
area, although that’s actually more a matter of digital material publishers
providing authentication and API options. If they provide the hooks, we can
provide the services.

We also use EBSCO/Academic Search Premier. It would be nice to search that
content from within Koha. I find an even more compelling solution would be
to let EBSCO have a z39.50 connection to our Koha catalog, and push our
patrons to use their search engine. EBSCO has interfaces into several other
catalogs (PsychArticles, OED, Britanica, JSTOR, etc) to which we also
subscribe, and they have more resources for a fast cross-catalog search.
It's the difference between Discovery and Federated Search. EBSCO is better
positioned to index and maintain all those records than we are. Then Koha
is only used in the office and at the circulation desk, and the Discovery
service becomes the real front-end for our patrons. Unfortunately, to this
point the discovery products have proved too expensive.

I’m curious to hear your thoughts. If you have ideas for things you’d like
to see, feel free to share them. If you have a budget for sponsoring
development, feel free to mention that as well. It’s actually quite an
exciting time to be working on Koha as large organizations like EBSCO and
the National Library of Sweden are sponsoring developments in areas like
OAI-PMH harvesting (which I’m working on now), moving to ElasticSearch for
search and retrieval, a new REST API, etc. Even if you don’t have the money
yourself, you might propose ideas on the koha-devel listserv and see if
anyone is interested in sponsoring that work.

Unfortunately, I don't have any development budget. I do see an opportunity
out there for a group like Koha to bust up existing discovery services,
which frankly grossly over-charge their customers. Koha could create their
own index of EBSCO content and similar public/semi-public resources in the
cloud. It could use participating libraries to do the harvesting, and
charge a modest fee for access. This would greatly improve search and
discovery at many libraries at much lower cost than alternatives, while
also creating a revenue stream for the foundation. It's not a trivial
undertaking, though.




Joel Coehoorn
Director of Information Technology
402.363.5603
*jcoehoorn at york.edu <jcoehoorn at york.edu>*

The mission of York College is to transform lives through
Christ-centered education and to equip students for lifelong service to
God, family, and society

On Sun, Jan 31, 2016 at 8:35 PM, David Cook <dcook at prosentient.com.au>
wrote:

> Hi Joel,
>
>
>
> Thanks for your email. I think I’ve sorted things out with Flo, but I
> wanted to come back to your email, which I’m going to cross-post to the
> developer listserv as well.
>
>
>
> I originally started working on an OAI-PMH harvester to create an
> automatic import of records from DSpace into Koha, so that Koha could show
> electronic resources in the catalogue. While having good item types,
> frameworks, and templates are good, I found the main problem to be the
> metadata quality. Dublin Core doesn’t translate very well to MARC. Although
> now that I’m re-reading your email… are you talking about making MARC
> records in Koha and then using DSpace’s OAI-PMH harvester to import
> metadata from Koha? Then attaching digital materials in DSpace while using
> Koha to maintain the metadata? That could be interesting…
>
>
>
> I agree that Koha needs to evolve to have better support for digital
> materials. However, I’m not sure what you mean by “a first-class framework
> for URL-only item types” though. Could you elaborate on that?
>
>
>
> Better integration into systems like DSpace may be achieved using their
> evolving REST APIs, although again I’m not sure what you mean by “a user
> might end up checking-out some digital content stored there [DSpace? Koha?]
> (a record is created in the issues table) that doesn’t need to check back
> in”. Why would you need to check out digital content in DSpace? In my
> experience, content is either open to the public or limited to logged in
> users (as part of licensing/rights restrictions). Are you thinking of
> something more like hosting your own ebooks in DSpace and complying with
> concurrent user restrictions? That could be interesting as well, although I
> think the majority of that work would be done in DSpace with Koha just
> hooking into a DSpace API to determine if there are any “available” copies
> for the material that they found in the Koha catalogue. I admit I’m not
> super familiar with ebook DRM either, so I’m not certain how much
> intervention would be required for expiring “on loan” DRM protected library
> ebooks.
>
>
>
> Personally, I find most of our Koha client libraries are interested in
> allowing their users to find “only electronic resources” and to provide
> seamless access to third-party sites like EBSCO, Proquest, and other
> publishers that host the actual content. EBSCO have been putting work into
> that with the Koha-EDS plugin, although I’d like to see more done in that
> area, although that’s actually more a matter of digital material publishers
> providing authentication and API options. If they provide the hooks, we can
> provide the services.
>
>
>
> I’m curious to hear your thoughts. If you have ideas for things you’d like
> to see, feel free to share them. If you have a budget for sponsoring
> development, feel free to mention that as well. It’s actually quite an
> exciting time to be working on Koha as large organizations like EBSCO and
> the National Library of Sweden are sponsoring developments in areas like
> OAI-PMH harvesting (which I’m working on now), moving to ElasticSearch for
> search and retrieval, a new REST API, etc. Even if you don’t have the money
> yourself, you might propose ideas on the koha-devel listserv and see if
> anyone is interested in sponsoring that work.
>
>
>
> Cheers,
>
>
>
> David Cook
>
> Systems Librarian
>
> Prosentient Systems
>
> 72/330 Wattle St, Ultimo, NSW 2007
>
>
>
> *From:* Coehoorn, Joel [mailto:jcoehoorn at york.edu]
> *Sent:* Wednesday, 27 January 2016 10:25 AM
> *To:* David Cook <dcook at prosentient.com.au>
> *Cc:* koha at lists.katipo.co.nz; flo.pouyenne at gmail.com
> *Subject:* Re: [Koha] Koha and OAI-PMH
>
>
>
> It sounds like she's talking about using a program that acts kind of like
> a web search crawler, similar to how Googlebot operates, to push
> information into Koha.
>
>
>
> If that's the case, I don't think Koha would be a good solution for this
> right now, though you might get something working via a z39.50 connection.
>
>
>
> It may be that she's asking if Koha can store and serve/circulate
> harvested digital content directly. Again, I don't believe Koha in it's
> current form would be good at this. Koha does not currently provide any
> place to store binary files for circulation.
>
>
>
> It may also be that she's asking if she can use Koha as a front end into
> something like DSpace <http://www.dspace.org/> or Greenstone
> <http://www.greenstone.org/>.
>
>
>
> If that's the case, I believe Koha can do this right now, but the setup is
> not yet easy or straightforward. In a nutshell, you would want to enter
> marc records for each item in the repository, just like you do for current
> materials, but you would want to use item types and a custom framework so
> that the material is presented well.
>
>
>
> I would welcome improvements in Koha to support any of those scenarios.
> Ultimately, I think Koha needs to evolve to have better support for digital
> materials. At a minimum, this means a first-class framework for URL-only
> item types along with first-class integration into systems like DSpace,
> where a user might end up checking-out some digital content stored there (a
> record is created in the issues table) that doesn't need to check back in.
> As distasteful as it may be, this probably needs to also include some kind
> of copyright enforcement mechanism, to allow libraries to easily comply
> with external licensing requirements.
>
>
>
>
>
>
>
>
> Joel Coehoorn
> Director of Information Technology
> 402.363.5603
> *jcoehoorn at york.edu <jcoehoorn at york.edu>*
>
> The mission of York College is to transform lives through
> Christ-centered education and to equip students for lifelong service to
> God, family, and society
>
>
>
> On Tue, Jan 26, 2016 at 5:04 PM, David Cook <dcook at prosentient.com.au>
> wrote:
>
> Hi Florence,
>
> I'm afraid that I don't understand your question.
>
> What do you mean by "Can Koha be used to manage a digital library"?
>
> Are you asking if Koha can show metadata records from a digital library?
> Or are you asking if Koha can add/update/delete records in a digital
> library? Or both? Or something else?
>
> I'm currently working on an OAI-PMH harvester for Koha, but the sole
> intent at the moment is to harvest metadata records from other systems and
> import them into Koha.
>
> David Cook
> Systems Librarian
> Prosentient Systems
> 72/330 Wattle St, Ultimo, NSW 2007
>
>
> > -----Original Message-----
> > Date: Tue, 26 Jan 2016 15:08:31 +0000 (UTC)
> > From: Florence <flo.pouyenne at gmail.com>
> > To: koha at lists.katipo.co.nz
> > Subject: [Koha] Koha and OAI-PMH
> > Message-ID: <loom.20160126T160510-456 at post.gmane.org>
> > Content-Type: text/plain; charset=us-ascii
> >
> > Hello,
> >
> > Can Koha be used to manage a digital library ? With a OAI-PMH harvester ?
> >
> >
> >
> > ------------------------------
>
>
>
> _______________________________________________
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> https://lists.katipo.co.nz/mailman/listinfo/koha
>
>
>


More information about the Koha mailing list