[Koha] Encode 'ç' to import authority .marcxml file with authority

Tue Jul 20 04:48:11 NZST 2021

Hi Javier,

you must be careful when working with utf8.

When the inputfile and your python script is encoded in utf8, you need 
in my view no encode command.

I didn't not understand your python script. It reads somethng and then 
writes again a modified xml file?

There was no attachment in the last mail

Regards, Harald

Am 19.07.21 um 18:12 schrieb Javi Legido:
> Hi Harold.
>
> Many thanks for your quick reply.
>
> Changing encoding:
>
> -        return string.strip().encode("ascii",
> "xmlcharrefreplace").decode("ascii")
> +        return string.strip().encode("utf8",
> "xmlcharrefreplace").decode("utf8")
>
> Produces a MARCXML file which produces "0 records in file", so I can't
> import it. The string was:
>
> França
>
> Attached the MARCXML record for authorities and bibliographic which works
> (meaning that can be imported) but only for authorities produces the wrong
> encoding.
>
> Thanks.
>
> Javier
>
> On Mon, 19 Jul 2021 at 17:26, Harald Schaefer <fechsaer at gmail.com> wrote:
>
>> Hi,
>>
>> you should use the utf8 encoding, when creating a python file.
>>
>> The marcxml file should have in the first line encoding='UTF-8'
>>
>> In python you should use encode('utf8')
>>
>> Regards, Harald
>>
>> Am 19.07.21 um 16:10 schrieb Javi Legido:
>>> Hi there.
>>>
>>> I'm trying to import an authority type 'GEOGR_NAME' with 'ç' in its name
>>> (field '151 a'):
>>>
>>> França
>>>
>>> So far:
>>>
>>> 1. If I manually add it from GUI (I want to import it from .marcxml file)
>>> it works typing 'ç' character. If I save the record as MARCXML I get
>> below
>>> encoding:
>>>
>>>       <subfield code="a">Fran&#xE7;a</subfield>
>>>
>>> 2. If I use python to encode it:
>>>
>>>           return string.strip().encode("ascii",
>>> "xmlcharrefreplace").decode("ascii")
>>>
>>> The generated MARCXML line looks like:
>>>
>>>       <subfield code="a">França</subfield>
>>>
>>> In the GUI looks like 'Franȧ', and if I save it as MARCXML looks like:
>>>
>>>       <subfield code="a">Fran&#x227;</subfield>
>>>
>>> Worth mentioning that the bibliographic bit referencing this authority
>>> looks perfect, and it was created exactly the same as for authority, so
>> the
>>> only problem is with authority.
>>>
>>> Does anybody faced similar problem before? In other words I need to
>>> generate programatically a MARCXML file to later on import it to koha
>>> (21.x), and some of the records (authorities) contains 'ç' and are not
>>> being encoded right.
>>> _______________________________________________
>>>
>>> Koha mailing list  http://koha-community.org
>>> Koha at lists.katipo.co.nz
>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>> _______________________________________________
>>
>> Koha mailing list  http://koha-community.org
>> Koha at lists.katipo.co.nz
>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>>
> _______________________________________________
>
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha