Re: [Koha] Dealing with bad MARC records
Monday, June 18, 2007 17:30 CDT Hi, Kevin, Well, it's good to be able to wake up after a whole day sick and be able to do something useful for someone ... I've actually encountered the problem you're having, too. What I did as a quick fix was to convert the file into tagged MARC -- MARCEdit allowed me to do this; if it doesn't work for you let me know -- and then opened the tagged file in MS Word. Using MS Word however, I could ask it to find the weird non-numeric field designator and change it to something distinctively numeric. I had to do that because MARCEdit's global search and replace functions also wouldn't work for non-numeric fields either (at least not for me). Also note: in this case, you wouldn't want to change the 5|| into 500 because you might have 500 (General Note) fields you'd want, but maybe something like 582 (doesn't exist formally) or 59x (whatever of the 590s you haven't designated/planned any use for otherwise). If you don't have access to MS Word or you have other problems, give me a shout back off listserv. Despite my health, I should have a few free hours this week and as long as the programs work to do things automatically, I could convert the files for you easily enough. Do let the listserv know how things worked out, Kevin. In the meantime, hope this helps (at least as a work-around). Cheers, Steven F. Baljkas library tech at large Koha neophyte volunteer cataloguer Winnipeg, MB, Canada P.S. Sorry that LAC's records are fouling things up for you. I've encountered those 5|| fields, too. I think they were meant to be normative 500's and something 'exotic' happened in someone's cataloguing editor. If you don't mind taking the time when you have your solution worked out, you can report those kinds of boo-boo's to LAC (using the AMICUS no. for reference) and they will try to correct them. That way, everything improves for everyone. -- SFB ============================================================ From: Kevin O'Rourke <lists@caboose.org.uk> Date: 2007/06/18 Mon AM 03:58:20 CDT To: Koha Mailing List <koha@lists.katipo.co.nz> Subject: [Koha] Dealing with bad MARC records This question is not directly related to Koha itself, but to preparing MARC records for import. Some of the records we've been downloading from Libraries and Archives Canada contain a "5||" (two pipe characters) tag with information about translation. As far as I can tell this is not valid MARC. I developed a little Java program to pre-process records before importing into Koha, using the marc4j library to read MARC. This library refuses to read records containing non-numeric tags, causing us problems. Can anyone recommend any tools for ensuring that a MARC record contains only valid MARC? I could use MarcEdit to 'break' the files, edit out the bad tags and then re'make' them but this is a bit complicated, time-consuming and error-prone. -- Kevin O'Rourke ICT Coordinator, National Teachers' Institute, Kaduna, Nigeria 062 316972 _______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha ============================================================
Hi Steven,
I've actually encountered the problem you're having, too.
What I did as a quick fix was to convert the file into tagged MARC -- MARCEdit allowed me to do this; if it doesn't work for you let me know -- and then opened the tagged file in MS Word. Using MS Word however, I could ask it to find the weird non-numeric field designator and change it to something distinctively numeric.
The trouble with any manual process like that is that our librarians will skip it. I'm already struggling to get them to follow the minimal workflow I introduced, they have a tendency to take short cuts that end up with bad or incomplete records in our catalogue. It's difficult to persuade them that it's not quantity but quality of records that's important. In this case, when my program didn't work, they decided to just import the records straight into Koha (without asking me). Koha couldn't read the records, so they decided to enter them manually but couldn't be bothered typing anything more than a few words from the title, misspelled and with incorrect capitalisation.
I had to do that because MARCEdit's global search and replace functions also wouldn't work for non-numeric fields either (at least not for me).
I managed to use MarcEdit's Script Maker to build a VBscript that would delete all 5|| fields in a file for me. I'm worried that there might be other bad fields in records, not just the 5||.
Also note: in this case, you wouldn't want to change the 5|| into 500 because you might have 500 (General Note) fields you'd want, but maybe something like 582 (doesn't exist formally) or 59x (whatever of the 590s you haven't designated/planned any use for otherwise).
My temporary solution has been to modify the marc4j library so that it doesn't just die when it encounters one of these fields. This means that I get the whole record, including the bad fields, in my program. When I'm saving the user's selected and modified records back out I strip out some fields anyway so I've now modified it to strip out any non-numeric fields at this stage. Of course, this means I'm discarding the information in those fields. I could add a lookup table mapping bad fields to a valid field (for example, 5|| -> 582).
P.S. Sorry that LAC's records are fouling things up for you. I've encountered those 5|| fields, too. I think they were meant to be normative 500's and something 'exotic' happened in someone's cataloguing editor. If you don't mind taking the time when you have your solution worked out, you can report those kinds of boo-boo's to LAC (using the AMICUS no. for reference) and they will try to correct them. That way, everything improves for everyone. I've emailed them with the details.
Kevin -- Kevin O'Rourke ICT Coordinator, National Teachers' Institute, Kaduna, Nigeria 062 316972
P.S. Sorry that LAC's records are fouling things up for you. I've encountered those 5|| fields, too. I think they were meant to be normative 500's and something 'exotic' happened in someone's cataloguing editor. If you don't mind taking the time when you have your solution worked out, you can report those kinds of boo-boo's to LAC (using the AMICUS no. for reference) and they will try to correct them. That way, everything improves for everyone.
I got a really nice reply from LAC where they explained that their records come from a wide variety of systems, some beyond their control. To quote: "The fill character is allowable in MARC coding and was widely used in the early years of MARC, but most libraries try not to use it, and your system, quite rightly, is rejecting it. ... The record in question was created in 1981. The field in question should be a 500 general note field." I've modified my program, it now: - loads records with non-numeric MARC fields - maps some fields to others (for example, 5|| to 500) - strips out any remaining non-numeric fields on saving This should keep everything nice and simple for our librarians. Kevin -- Kevin O'Rourke ICT Coordinator, National Teachers' Institute, Kaduna, Nigeria 062 316972
participants (2)
-
Kevin O'Rourke -
Steven F.Baljkas