[Koha] Dealing with bad MARC records

Kevin O'Rourke lists at caboose.org.uk
Tue Jun 19 20:50:10 NZST 2007


Hi Steven,

> I've actually encountered the problem you're having, too.
>
> What I did as a quick fix was to convert the file into tagged MARC -- MARCEdit allowed me to do this; if it doesn't work for you let me know -- and then opened the tagged file in MS Word. Using MS Word however, I could ask it to find the weird non-numeric field designator and change it to something distinctively numeric.
>   
The trouble with any manual process like that is that our librarians 
will skip it.  I'm already struggling to get them to follow the minimal 
workflow I introduced, they have a tendency to take short cuts that end 
up with bad or incomplete records in our catalogue.  It's difficult to 
persuade them that it's not quantity but quality of records that's 
important.

In this case, when my program didn't work, they decided to just import 
the records straight into Koha (without asking me).  Koha couldn't read 
the records, so they decided to enter them manually but couldn't be 
bothered typing anything more than a few words from the title, 
misspelled and with incorrect capitalisation.

> I had to do that because MARCEdit's global search and replace functions also wouldn't work for non-numeric fields either (at least not for me).
>   
I managed to use MarcEdit's Script Maker to build a VBscript that would 
delete all 5|| fields in a file for me.  I'm worried that there might be 
other bad fields in records, not just the 5||.
> Also note: in this case, you wouldn't want to change the 5|| into 500 because you might have 500 (General Note) fields you'd want, but maybe something like 582 (doesn't exist formally) or 59x (whatever of the 590s you haven't designated/planned any use for otherwise).
>   
My temporary solution has been to modify the marc4j library so that it 
doesn't just die when it encounters one of these fields.  This means 
that I get the whole record, including the bad fields, in my program.  
When I'm saving the user's selected and modified records back out I 
strip out some fields anyway so I've now modified it to strip out any 
non-numeric fields at this stage.

Of course, this means I'm discarding the information in those fields.  I 
could add a lookup table mapping bad fields to a valid field (for 
example, 5|| -> 582).

> P.S. Sorry that LAC's records are fouling things up for you. I've encountered those 5|| fields, too. I think they were meant to be normative 500's and something 'exotic' happened in someone's cataloguing editor.
>   If you don't mind taking the time when you have your solution worked out, you can report those kinds of boo-boo's to LAC (using the AMICUS no. for reference) and they will try to correct them. That way, everything improves for everyone.
I've emailed them with the details.

Kevin

-- 
Kevin O'Rourke
ICT Coordinator, National Teachers' Institute, Kaduna, Nigeria
062 316972



More information about the Koha mailing list