[Koha] Unicode

Thomas D koha at alinto.com
Tue Dec 20 01:44:02 NZDT 2005


MULTITUDE OF CHARACTER SETS AND ENCODINGS

There are variant encodings of Unicode.  UTF-8 is but one of them.  There
are also other encodings such as UTF-16, UTF-32, and USC2.  Conversion
applications can convert between different encodings.

MS Windows can use UTF-16 directly for keyboard output but not UTF-8.  To my
knowledge there are no keyboard generation applications that work around
this problem directly.

Unix can use UTF-8 directly for keyboard output so encoding conversion
issues are less problematic.

MARC records have more usually used other older character sets to represent
similar sets of characters to Unicode.  These library character set
standards were developed before Unicode existed.  One such standard that is
prevalent in MARC-21 records is the MARC-8 character set.  MARC-8 should not
be confused with UTF-8.  They are not compatible but character set
conversion applications can convert between them.


CHARCTER SETS AND ENCODINGS IN KOHA

Koha 3.0 should convert between MARC-8 and UTF-8 for at least major Western
European languages.  Chinese may have to wait for Koha 3.0.X. especially as
I do not know how to identify which Chinese glyphs are which.  At least with
Western European languages, I know how to read the alphabets even when I do
not know how to read the language.

Previously you have changed the Koha SQL columns from ISO 8859 to UTF-8 if
necessary and the charset headers for the web pages that Koha sends to the
webserver from ISO 8859 to UTF-8.  The web browser then would seem to have
done a certain degree of conversion work automatically that I had not
expected would happen as well for characters that you typed as opposed to
characters that were merely displayed within the web browser.  However, this
seems to have worked for you so far on MS Windows.  I would presume that the
web browser itself would then be converting between UTF-16 from MS Windows
and UTF-8 inside the web browser before posting back to Koha.

If the issue that you have now is only for one or two characters after a
conversion, that seems like the converting application had partial failure.
 I would suggest that the conversion inside MARCedit was successful but that
the conversion inside your web browser for Koha was less succesful.  What
web browser and version are you using with Koha?

I am assuming that the most all of the characters in your problematic
records are in Chinese.  I am also assuming that you have used
bulkmarcimport.pl to import these records.  Please let me know if either is
not the case.


KOHA WINDOWS LIST CHANGE

Do you have any responses from the Savannah list any longer?  The address
for the Koha Windows list is now koha-win32 at nongnu.org .  This change is
part of a move of the Koha project from Sourceforge to Savannah. The
Sourceforge site had become much to unresponsive with the volume of users
relative to the provision of servers.  You should have a better response
about MS Windows issues on the MS Windows list.  Unfortunate mailing lists
on Savannah do seem to suffer from delays in the mail queue.


Thomas D


Quoting Carol Ku <carolcool01 at yahoo.com> :
> ---------------- Beginning of the original message ------------------
> 
> I imported two book records with chinese characters.  However,
> there are about
> one or two characters that show up wacky.  I used MARCEdit to
> convert the text
> file into MARC UTF file.  When I open the file using MARCedit,
> all the
> characters look fine.
>    
>   I was told that MARCEdit uses Arial Unicode MS, is it the
> same code as UTF8? 
> If not, how can I oversome this problem?
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection
> around 
> http://mail.yahoo.com
> <br><center><hr></center>_______________________________________________
> Koha mailing list
> Koha at lists.katipo.co.nz
> http://lists.katipo.co.nz/mailman/listinfo/koha
> 
> ------------------- End of the original message ---------------------




---------------------------------------------
Protect your mails from viruses thanks to Alinto Premium services http://www.alinto.com


More information about the Koha mailing list