MULTITUDE OF CHARACTER SETS AND ENCODINGS There are variant encodings of Unicode. UTF-8 is but one of them. There are also other encodings such as UTF-16, UTF-32, and USC2. Conversion applications can convert between different encodings. MS Windows can use UTF-16 directly for keyboard output but not UTF-8. To my knowledge there are no keyboard generation applications that work around this problem directly. Unix can use UTF-8 directly for keyboard output so encoding conversion issues are less problematic. MARC records have more usually used other older character sets to represent similar sets of characters to Unicode. These library character set standards were developed before Unicode existed. One such standard that is prevalent in MARC-21 records is the MARC-8 character set. MARC-8 should not be confused with UTF-8. They are not compatible but character set conversion applications can convert between them. CHARCTER SETS AND ENCODINGS IN KOHA Koha 3.0 should convert between MARC-8 and UTF-8 for at least major Western European languages. Chinese may have to wait for Koha 3.0.X. especially as I do not know how to identify which Chinese glyphs are which. At least with Western European languages, I know how to read the alphabets even when I do not know how to read the language. Previously you have changed the Koha SQL columns from ISO 8859 to UTF-8 if necessary and the charset headers for the web pages that Koha sends to the webserver from ISO 8859 to UTF-8. The web browser then would seem to have done a certain degree of conversion work automatically that I had not expected would happen as well for characters that you typed as opposed to characters that were merely displayed within the web browser. However, this seems to have worked for you so far on MS Windows. I would presume that the web browser itself would then be converting between UTF-16 from MS Windows and UTF-8 inside the web browser before posting back to Koha. If the issue that you have now is only for one or two characters after a conversion, that seems like the converting application had partial failure. I would suggest that the conversion inside MARCedit was successful but that the conversion inside your web browser for Koha was less succesful. What web browser and version are you using with Koha? I am assuming that the most all of the characters in your problematic records are in Chinese. I am also assuming that you have used bulkmarcimport.pl to import these records. Please let me know if either is not the case. KOHA WINDOWS LIST CHANGE Do you have any responses from the Savannah list any longer? The address for the Koha Windows list is now koha-win32@nongnu.org . This change is part of a move of the Koha project from Sourceforge to Savannah. The Sourceforge site had become much to unresponsive with the volume of users relative to the provision of servers. You should have a better response about MS Windows issues on the MS Windows list. Unfortunate mailing lists on Savannah do seem to suffer from delays in the mail queue. Thomas D Quoting Carol Ku <carolcool01@yahoo.com> :
---------------- Beginning of the original message ------------------
I imported two book records with chinese characters. However, there are about one or two characters that show up wacky. I used MARCEdit to convert the text file into MARC UTF file. When I open the file using MARCedit, all the characters look fine.
I was told that MARCEdit uses Arial Unicode MS, is it the same code as UTF8? If not, how can I oversome this problem?
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com <br><center><hr></center>_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
------------------- End of the original message ---------------------
--------------------------------------------- Protect your mails from viruses thanks to Alinto Premium services http://www.alinto.com