[Koha] Encode 'ç' to import authority .marcxml file with authority

Harald Schaefer fechsaer at gmail.com
Tue Jul 20 08:19:40 NZST 2021


Hi Javier,

it seems that your csv text is iso885 encoded,

so you needed something in your python code like

   utf8str = iso8859str.decode('iso-8859-1').encode('utf8')

You may search the internet for

   python read iso8859 strings and convert them to utf8

Best regards, Harald

Am 19.07.21 um 19:23 schrieb Javi Legido:
> Hi there.
>
> We are parsing a CSV file (DBText software) and convert it to MARCXML.
>
> Everything works fine, except for this super little detail.
>
> I will copy the files, since looks like the mailing list does not allows
> attach them.
>
> Thanks again for your time.
>
> Javier
>
> authorities.1.marcxml
>
> <?xml version="1.0" encoding="UTF-8"?>
> <record
>      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>      xsi:schemaLocation="http://www.loc.gov/MARC21/slim
> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
>      xmlns="http://www.loc.gov/MARC21/slim">
>
>    <controlfield tag="001">337</controlfield>
>    <controlfield tag="003">OSt</controlfield>
>    <!-- WARNING: hardcoded -->
>    <datafield tag="040" ind1=" " ind2=" ">
>      <subfield code="a">OSt</subfield>
>    </datafield>
>    <datafield tag="100" ind1=" " ind2=" ">
>      <subfield code="a">VIENNEY,  Claude</subfield>
>    </datafield>
>    <datafield tag="942" ind1=" " ind2=" ">
>      <subfield code="a">PERSO_NAME</subfield>
>    </datafield>
> </record>
> <record
>      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>      xsi:schemaLocation="http://www.loc.gov/MARC21/slim
> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
>      xmlns="http://www.loc.gov/MARC21/slim">
>
>    <controlfield tag="001">338</controlfield>
>    <controlfield tag="003">OSt</controlfield>
>    <!-- WARNING: hardcoded -->
>    <datafield tag="040" ind1=" " ind2=" ">
>      <subfield code="a">OSt</subfield>
>    </datafield>
>    <datafield tag="150" ind1=" " ind2=" ">
>      <subfield code="a">Formació cooperativa</subfield>
>    </datafield>
>    <datafield tag="942" ind1=" " ind2=" ">
>      <subfield code="a">TOPIC_TERM</subfield>
>    </datafield>
> </record>
> <record
>      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>      xsi:schemaLocation="http://www.loc.gov/MARC21/slim
> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
>      xmlns="http://www.loc.gov/MARC21/slim">
>
>    <controlfield tag="001">338</controlfield>
>    <controlfield tag="003">OSt</controlfield>
>    <!-- WARNING: hardcoded -->
>    <datafield tag="040" ind1=" " ind2=" ">
>      <subfield code="a">OSt</subfield>
>    </datafield>
>    <datafield tag="151" ind1=" " ind2=" ">
>      <subfield code="a">França</subfield>
>    </datafield>
>    <datafield tag="942" ind1=" " ind2=" ">
>      <subfield code="a">GEOGR_NAME</subfield>
>    </datafield>
> </record>
>
> =====
>
> bibliographic.marcxml
>
> <?xml version="1.0" encoding="UTF-8"?>
> <record
>      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>      xsi:schemaLocation="http://www.loc.gov/MARC21/slim
> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
>      xmlns="http://www.loc.gov/MARC21/slim">
>
>    <controlfield tag="003">OSt</controlfield>
>
>    <datafield tag="020" ind1=" " ind2=" ">
>      <subfield code="a">2.903819.03.3</subfield>
>    </datafield>
>    <datafield tag="024" ind1=" " ind2=" ">
>      <subfield code="q">199900562</subfield>
>    </datafield>
>    <datafield tag="040" ind1=" " ind2=" ">
>      <!-- HARDCODED. Looks like some kind of library name -->
>      <subfield code="c">esbafrg</subfield>
>    </datafield>
>
>    <datafield tag="041" ind1="1" ind2=" ">
>      <subfield code="a">Francesa</subfield>
>      </datafield>
>    <datafield tag="080" ind1=" " ind2=" ">
>      <subfield code="a">c.0.9.3.4 |  c.1.1</subfield>
>    </datafield>
>    <datafield tag="082" ind1=" " ind2=" ">
>      <subfield code="b">199900562</subfield>
>    </datafield>
>    <datafield tag="100" ind1=" " ind2=" ">
>    <subfield code="9">337</subfield>
>    <subfield code="a">VIENNEY,  Claude</subfield>
>    </datafield>
>    <datafield tag="245" ind1=" " ind2=" ">
>      <subfield code="a">Socio-economie des organisations
> cooperatives</subfield>
>    </datafield>
>
>    <datafield tag="260" ind1=" " ind2=" ">
>    <subfield code="a">París</subfield>
>    <subfield code="b">CIEM</subfield>
>    <subfield code="c">1982</subfield>
>    </datafield>
>    <datafield tag="300" ind1=" " ind2=" ">
>    <subfield code="a">333 p.24 cm.</subfield>
>    <subfield code="b">Tomo II. Analyse comparèe des cooperatives
> fonctionnant dans des sy</subfield>
>    </datafield>
>    <datafield tag="650" ind1="1" ind2=" ">
>      <subfield code="a">Formació cooperativa</subfield>
>    </datafield>
>    <datafield tag="651" ind1="1" ind2=" ">
>      <subfield code="a">França</subfield>
>    </datafield>
>    <datafield tag="942" ind1=" " ind2=" ">
>      <subfield code="2">ddc</subfield>
>      <!-- WARNING: Should be manually created first:
>      https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl -->
>      <subfield code="c"></subfield>
>    </datafield>
>
>
>    <datafield tag="952" ind1=" " ind2=" ">
>      <!-- HARDCODED. Librarian card number, created during initial koha
> install -->
>      <subfield code="a">1234567890</subfield>
>      <!-- HARDCODED. Librarian card number, created during initial koha
> install -->
>      <subfield code="b">1234567890</subfield>
>      <subfield code="o">AB-125</subfield>
>    <!-- WARNING: Should be manually created first:
>      https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl -->
>      <subfield code="y"></subfield>
>    </datafield>
>
>
> </record>
>
> On Mon, 19 Jul 2021 at 18:48, Harald Schaefer <fechsaer at gmail.com> wrote:
>
>> Hi Javier,
>>
>> you must be careful when working with utf8.
>>
>> When the inputfile and your python script is encoded in utf8, you need
>> in my view no encode command.
>>
>> I didn't not understand your python script. It reads somethng and then
>> writes again a modified xml file?
>>
>> There was no attachment in the last mail
>>
>> Regards, Harald
>>
>> Am 19.07.21 um 18:12 schrieb Javi Legido:
>>> Hi Harold.
>>>
>>> Many thanks for your quick reply.
>>>
>>> Changing encoding:
>>>
>>> -        return string.strip().encode("ascii",
>>> "xmlcharrefreplace").decode("ascii")
>>> +        return string.strip().encode("utf8",
>>> "xmlcharrefreplace").decode("utf8")
>>>
>>> Produces a MARCXML file which produces "0 records in file", so I can't
>>> import it. The string was:
>>>
>>> França
>>>
>>> Attached the MARCXML record for authorities and bibliographic which works
>>> (meaning that can be imported) but only for authorities produces the
>> wrong
>>> encoding.
>>>
>>> Thanks.
>>>
>>> Javier
>>>
>>> On Mon, 19 Jul 2021 at 17:26, Harald Schaefer <fechsaer at gmail.com>
>> wrote:
>>>> Hi,
>>>>
>>>> you should use the utf8 encoding, when creating a python file.
>>>>
>>>> The marcxml file should have in the first line encoding='UTF-8'
>>>>
>>>> In python you should use encode('utf8')
>>>>
>>>> Regards, Harald
>>>>
>>>> Am 19.07.21 um 16:10 schrieb Javi Legido:
>>>>> Hi there.
>>>>>
>>>>> I'm trying to import an authority type 'GEOGR_NAME' with 'ç' in its
>> name
>>>>> (field '151 a'):
>>>>>
>>>>> França
>>>>>
>>>>> So far:
>>>>>
>>>>> 1. If I manually add it from GUI (I want to import it from .marcxml
>> file)
>>>>> it works typing 'ç' character. If I save the record as MARCXML I get
>>>> below
>>>>> encoding:
>>>>>
>>>>>        <subfield code="a">Fran&#xE7;a</subfield>
>>>>>
>>>>> 2. If I use python to encode it:
>>>>>
>>>>>            return string.strip().encode("ascii",
>>>>> "xmlcharrefreplace").decode("ascii")
>>>>>
>>>>> The generated MARCXML line looks like:
>>>>>
>>>>>        <subfield code="a">França</subfield>
>>>>>
>>>>> In the GUI looks like 'Franȧ', and if I save it as MARCXML looks like:
>>>>>
>>>>>        <subfield code="a">Fran&#x227;</subfield>
>>>>>
>>>>> Worth mentioning that the bibliographic bit referencing this authority
>>>>> looks perfect, and it was created exactly the same as for authority, so
>>>> the
>>>>> only problem is with authority.
>>>>>
>>>>> Does anybody faced similar problem before? In other words I need to
>>>>> generate programatically a MARCXML file to later on import it to koha
>>>>> (21.x), and some of the records (authorities) contains 'ç' and are not
>>>>> being encoded right.
>>>>> _______________________________________________
>>>>>
>>>>> Koha mailing list  http://koha-community.org
>>>>> Koha at lists.katipo.co.nz
>>>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>>>> _______________________________________________
>>>>
>>>> Koha mailing list  http://koha-community.org
>>>> Koha at lists.katipo.co.nz
>>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>>>>
>>> _______________________________________________
>>>
>>> Koha mailing list  http://koha-community.org
>>> Koha at lists.katipo.co.nz
>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>> _______________________________________________
>>
>> Koha mailing list  http://koha-community.org
>> Koha at lists.katipo.co.nz
>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>>
> _______________________________________________
>
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha


More information about the Koha mailing list