[Koha] rebuild_nozebra.pl failures: wide characters and missing end tags in MARC XML
Jeffrey LePage
jeffrey_lepage at yahoo.com
Sat Feb 21 09:00:50 NZDT 2009
Joe,
Thanks for your quick response. As you suggest, I will fix the MARC file and re-import.
However, two of the records seem to have come from an attempted import from a Z39.50 search. It was the same book, attempted twice. In this case there was a single wide character in one of the MARC subfields. The result was
1) an error message (which we didn't save, sorry)
2) entries inserted into database tables biblio and biblioitems
3) no entries in the items table.
we are running Koha using MARC21. As far as I can tell, this seems to be a reasonable choice for an English-language library. Is it possible that the Z39.50 server that returned the offending record is running UNIMARC?
Finding a set of Z39.50 servers that consistently yield good data seems to be one of our biggest problems. Does anyone have any experience with this problem.
--
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
--- On Fri, 2/20/09, Joe Atzberger <ohiocore at gmail.com> wrote:
> From: Joe Atzberger <ohiocore at gmail.com>
> Subject: Re: [Koha] rebuild_nozebra.pl failures: wide characters and missing end tags in MARC XML
> To: "Jeffrey LePage" <jeffrey_lepage at yahoo.com>
> Cc: koha at lists.katipo.co.nz
> Date: Friday, February 20, 2009, 11:48 AM
> You should look at the files you imported with MarcEdit or a
> similar tool.
> It may be necessary to reimport the corrected files after
> repairing them.
>
> --Joe
>
> On Fri, Feb 20, 2009 at 12:07 PM, Jeffrey LePage
> <jeffrey_lepage at yahoo.com>wrote:
>
> > Greetings,
> >
> > We have a small library, and for the sake of
> simplicity, we run without
> > zebra.
> >
> > When I run rebuild_nozebra.pl interactively I'm
> getting errors on 15 of our
> > 8000+ records. Judging by the biblioitem numbers, I
> think all or most of
> > these records got into the system when we imported
> MARC records from
> > Sagebrush Athena.
> >
> > The errors fall into 2 categories:
> >
> > Error 1
> > ----------------
> > Cannot decode string with wide characters at
> /usr/lib/perl/5.8/Encode.pm
> > line 166
> >
> > I can see the wide characters when I query
> biblioitems.marcxml. For
> > example, there's an accent mark over an
> illustrator's name in one of the
> > Harry Potter books.
> >
> > How do I fix these records? I note that the
> biblioitems table contains:
> > a) marcxml longtext
> > b) marc longblob
> >
> > Error 2
> > ---------------
> > No close tag marker
> >
> > The MARC record is indeed missing close tags. In each
> case, the datafield
> > tag 952, subfield 6 is not closed. The MARC record
> ends like this:
> >
> > <datafield tag=952" ind1=" "
> ind2=" ">
> > <subfield code="7">0</subfield>
> > <subfield
> code="p">5595</subfield>
> > <subfield code="4">0</subfield>
> > <subfield code="0">0</subfield>
> > <subfield
> code="6">SPA_635_900000000000000_S
> >
> >
> > *******************
> > How should I fix these errors and how should I prevent
> them in the future?
> > If I manually repair biblioitems.marcxml do I also
> need to repair
> > biblioitems.marc (which is a blob)?
> >
> >
> > Thanks,
> > Jeff LePage
> >
> >
> >
> >
> > --
> > Please avoid sending me Word or PowerPoint
> attachments.
> > See
> http://www.gnu.org/philosophy/no-word-attachments.html
> >
> >
> >
> > _______________________________________________
> > Koha mailing list
> > Koha at lists.katipo.co.nz
> > http://lists.katipo.co.nz/mailman/listinfo/koha
> >
More information about the Koha
mailing list