[Koha] How to remove unwanted characters when importing MARC data?

Jonathan Druart jonathan.druart at bugs.koha-community.org
Thu Jun 22 05:35:48 NZST 2017


Take a look at C4::Charset::nsb_clean
I guess you can add more substitutions there.

On Wed, 21 Jun 2017 at 13:55 Michael Kuhn <mik at adminkuhn.ch> wrote:

> Hi
>
> Our library receives MARC data from EKZ (a German cataloging data
> provider) which includes two unwanted characters:
>
> * a beginning "non-sorting character"
> * an ending "non-sorting character"
>
> These characters can't be seen in the OPAC and in the hitlist of the
> staff client, but they do appear in the framework and also in the top
> line of the webbrowser. Here is an example of a file containing such
> characters: http://adminkuhn.ch/download/kuhn0000000
>
> When opening the original .mrc file with vi these characters show as:
>
> <98>The<9c> obsession
>
> With "od -c" they show as:
>
> 302 230   T   h   e 302 234       o   b   s   e   s   s   i   o   n
>
> Of course these characters could be removed e. g. with sed (but this
> will result in a wrong character length in MARC LEADER positions 0-4)
> and also it has to be done separately on the shell outside and before
> the regular importing process. Or even using software like MarcEdit.
>
> Now the question is if there is an EASY way how to delete these unwanted
> characters within Koha, for example by using the MARC modification
> templates which is used anyway when loading such data?
>
> Best wishes: Michael
> --
> Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis
> Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz
> T 0041 (0)61 261 55 61 <+41%2061%20261%2055%2061> · E mik at adminkuhn.ch ·
> W www.adminkuhn.ch
> _______________________________________________
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> https://lists.katipo.co.nz/mailman/listinfo/koha
>


More information about the Koha mailing list