[Koha] How to remove unwanted characters when importing MARC data?

Michael Kuhn mik at adminkuhn.ch
Thu Jun 22 04:55:38 NZST 2017


Our library receives MARC data from EKZ (a German cataloging data 
provider) which includes two unwanted characters:

* a beginning "non-sorting character"
* an ending "non-sorting character"

These characters can't be seen in the OPAC and in the hitlist of the 
staff client, but they do appear in the framework and also in the top 
line of the webbrowser. Here is an example of a file containing such 
characters: http://adminkuhn.ch/download/kuhn0000000

When opening the original .mrc file with vi these characters show as:

<98>The<9c> obsession

With "od -c" they show as:

302 230   T   h   e 302 234       o   b   s   e   s   s   i   o   n

Of course these characters could be removed e. g. with sed (but this 
will result in a wrong character length in MARC LEADER positions 0-4) 
and also it has to be done separately on the shell outside and before 
the regular importing process. Or even using software like MarcEdit.

Now the question is if there is an EASY way how to delete these unwanted 
characters within Koha, for example by using the MARC modification 
templates which is used anyway when loading such data?

Best wishes: Michael
Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis
Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz
T 0041 (0)61 261 55 61 · E mik at adminkuhn.ch · W www.adminkuhn.ch

More information about the Koha mailing list