[Koha] how to fix encoding of biblioitems.kohaxml (moving to latest Koha)
Giuseppe Angilella
Giuseppe.Angilella at ct.infn.it
Sun Feb 2 19:11:48 NZDT 2014
Hi,
I'm upgrading to Koha latest release (3.14.01.000) from a quite old one
(2.x).
I've imported the old database, let the web installation procedure perform
the automatic steps to upgrade the sql structure, then launched the
convert_to_utf8.pl tool in .../migration_tools/22_to_30/
My stuff (items, patrons, several configuration parameters) seems to be
there. However, koha_rebuild_zebra fails, because of "wide characters"
present in the database:
specific error is Wide character in subroutine entry at
/usr/share/perl5/MARC/Charset/Table.pm line 96
This results in a (hopefully) consistent, but unsearchable database.
A closer inspection of the biblioitems table reveals that several (140 out
of some 2800) items contain "accented characters" in various fields. These
may have crept in at various stages, e.g. when inserting new records using
a Mac keyboard, or on the occasion of not so careful upgrades.
Although I had set the default character set as UTF8 (both for the locale,
apache and mysql, let alone Koha itself), when I try to insert in the
database a new record (containing accented letters in the title field,
say) from the web interface, the title looks ok when accessing the
newly created record (cgi-bin/koha/catalogue/detail.pl?biblionumber=...),
whereas the "MARC Preview" shows corrupted accented letters, as if they
weren't encoded in UTF8.
An even closer inspection suggests that the encoding might indeed be UTF8,
but that some characters follow the NFD, rather than NFC, convention
(accented characters are represented as two separate characters).
Is there a way to fix my biblioitems table? Is there a way to have new
records entered correctly, at least?
(I've even tried ALTER'ing the table to binary, then back to text with the
correct encoding, field by field.)
I could in principle go and fix each of the 140 corrupted MARC's via the
web interface, if that's the only way, but the point is that, seemingly,
even newly entered accented characters produce fancy output in the marcxml
field.
(The same may possibly apply also to the marc field, but that's a binary
sql LONGBLOB, therefore possibly uneditable, but I assume it should be
harmless to koha_rebuild_zebra .)
Many thanks for all your suggestions and help.
Regards,
Giuseppe.
More information about the Koha
mailing list