character encoding & bulkmarcimport.pl

7 Jun 2007

      Hi,
I'm wondering if the script bulkmarcimport.pl is simply checking for
the value of leader byte 9 if your framework is set to MARC21/USMARC
records and then trusting that value. If it's blank it assumes MARC-8
encoding and converts it to UTF-8 using MARC::Charset; if ldr9 is 'a'
then it assumes UTF-8 and leaves it as is. Is there something else
going on here?

Is this a relatively safe approach with MARC21/USMARC records as a
whole? Should bulkmarcimport.pl only be used on records that are known
to be only MARC-8 and/or UTF-8?

I'm wondering if there aren't other non-standard character encodings
for MARC21 records out there. For instance, the Wellcome library [1]
says it provides records in MARC21 and then says they are in
ISO-8859-1 (Latin1) character set. I can imagine there are others out
there. I don't know if Latin1 would be a problem, but it seems that
other character encodings might be if MARC-8 is assumed to be the
character encoding when it isn't.

Does the built in Z39.50 search, do character set conversion to UTF-8 as well?

Thanks for any help you can provide understanding how Koha handles
character encodings.

--Jason

[1] http://library.wellcome.ac.uk/node58.html#P24_1668

Jason Ronallo

Joshua M. Ferraro

tags

participants (2)