[Koha] Has NFC vs. NFD encoding changed for Unicode in koha?

Katrin Fischer katrin.fischer.83 at web.de
Sun Sep 10 01:34:09 NZST 2023


Hi Jesse,

first: of course you can join koha-devel and you are welcome there :)

I am not aware of any conscious/intentional change in Koha and I know
that we've always been importing and exporting records with combined
diacritics (NFC) into Koha to avoid display and editing issues (German
umlauts etc.)

When you edit the records in Koha, do they present with NFC or NFD? (you
can tell by removing a diacritic, does it require one or two steps?)

How were the records added to Koha?

Can you replicate the behaviour on another installation for example on a
sandbox?

Hope this helps,

Katrin

On 03.09.23 05:11, Jesse Savage wrote:
> This is probably a question for a developer, but I'm not one, so don't
> really want to join the koha-devel list (and they probably wouldn't want me
> to), and I know a lot of developers frequent this list. My apologies to
> anyone who might not be interested in this question.
>
> I recently got a new Windows laptop on which I installed (I think) a newer
> version of WSL/Ubuntu, and I see that records exported with the "MARC
> (Unicode/UTF-8)" option (as *utf8 files, which look to be basically *mrc)
> apparently use NFD (or "decomposed") encodings for Unicode characters (in
> my case, mainly Spanish and French titles) and thus don't display properly
> in *less*, *grep*, or *cat* (the diacritic follows the standard-Latin
> character, rather than integrated with it) as do files encoded with NFC
> characters, and the characters also can't be *grep*'d with searches like
> "grep $'\u16A0'." The program I use to update records for uploading also
> outputs NFD, even if the records it takes for input contain NFC. (I'm
> waiting on the answer to the present questions to see whether that program
> has an option hidden somewhere to output NFC instead, which I'd prefer.)
>
> So, is this a systematic change in koha? *Should I ensure that *.mrc files
> for batch uploading ALWAYS include only NFD characters, or do the
> underlying processes standardize NFC vs. NFD?*  The system on which the
> catalog lives (but which I don't administer) currently has koha
> 23.05.00.000 and runs on "SMP Debian 5.10.179-1 (2023-05-12) x86_64".
> Please feel free to respond directly if you feel the answer won't interest
> anybody else.
>
> Thanks very much in advance!
> Jesse
> ---------------------
> Jesse Savage
> (pronouns he, him, his)
> jessava at gmail.com
> _______________________________________________
>
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha


More information about the Koha mailing list