[Koha] How to remove unwanted characters when importing MARC data?

Thu Jun 22 07:24:09 NZST 2017

Hi Michael

Did you try marcedit?
http://marcedit.reeset.net/

Kind regards
Marc Véron

Am 21.06.2017 um 18:55 schrieb Michael Kuhn:
> Hi
>
> Our library receives MARC data from EKZ (a German cataloging data 
> provider) which includes two unwanted characters:
>
> * a beginning "non-sorting character"
> * an ending "non-sorting character"
>
> These characters can't be seen in the OPAC and in the hitlist of the 
> staff client, but they do appear in the framework and also in the top 
> line of the webbrowser. Here is an example of a file containing such 
> characters: http://adminkuhn.ch/download/kuhn0000000
>
> When opening the original .mrc file with vi these characters show as:
>
> <98>The<9c> obsession
>
> With "od -c" they show as:
>
> 302 230   T   h   e 302 234       o   b   s   e   s   s   i   o n
>
> Of course these characters could be removed e. g. with sed (but this 
> will result in a wrong character length in MARC LEADER positions 0-4) 
> and also it has to be done separately on the shell outside and before 
> the regular importing process. Or even using software like MarcEdit.
>
> Now the question is if there is an EASY way how to delete these 
> unwanted characters within Koha, for example by using the MARC 
> modification templates which is used anyway when loading such data?
>
> Best wishes: Michael