Bulkmarcimport and special characteres
Can anyone help me with 2 questions about Koha? First Question: How can we use, under OPAC, the "OR" logical operator? Second Question: The Portuguese Language uses some special characters like "ã". We have used bulkmarcimport for importing our database. This program changes "ão" to "ô", "ãe" to "ê " and "áu" to "ù". Have you any idea how to solve this problem? Best Regards. -- José Lagoas Laboratório Nacional de Engenharia Civil
From: "José Lagoas" <jlagoas@lnec.pt> Second Question: The Portuguese Language uses some special characters like "ã". We have used bulkmarcimport for importing our database. This program changes "ão" to "ô", "ãe" to "ê " and "áu" to "ù". Have you any idea how to solve this problem?
Sounds like a character encoding problem. If bulkmarcimport is using UTF-8 at last, try running: iconv -f iso-8859-15 -t utf-8 -o outfile.marc infile.marc and then feed outfile.marc to bulkmarcimport. Hope that helps, -- MJ Ray - see/vidu http://mjr.towers.org.uk/email.html Somerset, England. Work/Laborejo: http://www.ttllp.co.uk/ IRC/Jabber/SIP: on request/peteble
From: "José Lagoas" <jlagoas@lnec.pt> Second Question: The Portuguese Language uses some special characters like "ã". We have used bulkmarcimport for importing our database. This program changes "ão" to "ô", "ãe" to "ê " and "áu" to "ù". Have you any idea how to solve this problem?
Sounds like a character encoding problem. If bulkmarcimport is using UTF-8 at last, try running: iconv -f iso-8859-15 -t utf-8 -o outfile.marc infile.marc and then feed outfile.marc to bulkmarcimport. Actually, this may work for UNIMARC, but it will completely corrupt a MARC21 file because MARC21 uses MARC8 encoding (and I doubt iconv understands MARC8). Also, iconv will not update the leader to specify
On Wed, Nov 08, 2006 at 06:53:12PM +0000, MJ Ray wrote: the encoding. To properly convert from MARC8 to UTF-8, you'll need to use a MARC editor (I think MARCEdit can do it), or you'll need to write a script to do the conversion using one of the MARC toolkits out there. Hope that helps, -- Joshua Ferraro SUPPORT FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLime Featuring Koha Open-Source ILS jmf@liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS
Joshua Ferraro wrote:
To properly convert from MARC8 to UTF-8, you'll need to use a MARC editor (I think MARCEdit can do it), or you'll need to write a script to do the conversion using one of the MARC toolkits out there.
I've been using marc2xml & xml2marc to convert from marc8 to xml to (presumably) utf8. Example: marc2xml mymarc8file.mrc > newxmlfile.xml then when it's finished: xml2marc newxmlfile.xml > newmarcfile.mrc At least, this is useful for finding which records it's choking on. For example, if you get errors from xml2marc and it stops prematurely, use 'tail --bytes=500 nnewmarcfile.mrc' to find out which record was just before the one that caused the problem, and then find that record in newxmlfile.xml, and examine the record after that to find the encoding problem. Of course, I'm no utf8 expert, so take this with a grain of salt! hth, c.
Joshua Ferraro noted:
iconv -f iso-8859-15 -t utf-8 -o outfile.marc infile.marc Actually, this may work for UNIMARC, but it will completely corrupt a MARC21 file because MARC21 uses MARC8 encoding (and I doubt iconv
I was assuming the file had been corrupted already, from the description of the problem. I could be wrong. It's quite right that iconv doesn't know MARC8 and a quick look didn't help suggest how to add it. Cindy Murdock's marc2xml|xml2marc solution sounds a good idea. Hope that explains, -- MJ Ray - see/vidu http://mjr.towers.org.uk/email.html Somerset, England. Work/Laborejo: http://www.ttllp.co.uk/ IRC/Jabber/SIP: on request/peteble
Hi all, At 13.46 09/11/2006, Joshua Ferraro wrote:
From: "José Lagoas" <jlagoas@lnec.pt> Second Question: The Portuguese Language uses some special characters like "ã". We have used bulkmarcimport for importing our database. This program changes "ão" to "ô", "ãe" to "ê " and "áu" to "ù". Have you any idea how to solve this problem?
Sounds like a character encoding problem. If bulkmarcimport is using UTF-8 at last, try running: iconv -f iso-8859-15 -t utf-8 -o outfile.marc infile.marc and then feed outfile.marc to bulkmarcimport. Actually, this may work for UNIMARC, but it will completely corrupt a MARC21 file because MARC21 uses MARC8 encoding (and I doubt iconv understands MARC8). Also, iconv will not update the leader to specify
On Wed, Nov 08, 2006 at 06:53:12PM +0000, MJ Ray wrote: the encoding.
To properly convert from MARC8 to UTF-8, you'll need to use a MARC editor (I think MARCEdit can do it), or you'll need to write a script to do the conversion using one of the MARC toolkits out there.
If Jose has characters encode in MARC8, the best tool to use is MARC::Charset, http://search.cpan.org/~esummers/MARC-Charset-0.95/. If you have data in iso-8859-x you can use iconv. To know what do you have, call the software vendor. All IMHO Bye, (:->> Zeno Tajoli CILEA - Segrate (MI) tajoliAT_SPAM_no_prendiATcilea.it (Indirizzo mascherato anti-spam; sostituisci quanto tra AT con @)
participants (5)
-
Cindy Murdock -
Joshua Ferraro -
José Lagoas -
MJ Ray -
Zeno Tajoli