[Koha] Encode 'ç' to import authority .marcxml file with authority
Harald Schaefer
fechsaer at gmail.com
Tue Jul 20 21:56:42 NZST 2021
Hi Javier,
the marc21 format and the leader field is described here
https://www.loc.gov/marc/bibliographic/bdleader.html
<https://www.loc.gov/marc/bibliographic/bdleader.html>
Byte 9 is the encoding field
Regards, Harald
Am 20.07.21 um 08:51 schrieb Javi Legido:
> Hi there.
>
> I can confirm that re-adding '<leader>' fixed the issue.
>
> Snippet from my templates:
>
> a) Bibliographic
>
> <!-- WARNING: hardcoded. Without below key encoding of non-ASCII
> characters will fail -->
> <leader>00276nam a22001097a 4500</leader>
>
> b) Authorities
>
> <!-- WARNING: hardcoded. Without below key encoding of non-ASCII
> characters will fail -->
> {% if authority_942_a == "PERSO_NAME" -%}
> <leader>00249nz a2200109n 4500</leader>
> {% elif authority_942_a == "GEOGR_NAME" -%}
> <leader>00237nz a2200109n 4500</leader>
> {% elif authority_942_a == "TOPIC_TERM" -%}
> <leader>00221nz a2200109n 4500</leader>
> {% endif -%}
>
> And in the other hand, but this is something a little bit of the scope of
> this thread, I also introduced below changes in python code:
>
> 1. The output .marcxml file now is 'UTF-8' encoded:
>
> with open(marcxml_filename, "w", encoding='utf8') as f:
> f.write(self.marcxml_content)
>
> Test:
>
> file -i /tmp/authorities.1.marcxml
> /tmp/authorities.1.marcxml: text/xml; charset=utf-8
>
> 2. Since 1) I no longer need to add the 'xmlcharrefreplace' hocus pocus:
>
> return string.strip()
>
> I still need to figure out '<leader>' format, since from now is hardcoded
> in my templates.
>
> Cheers.
>
> Javier
>
>
>
>
>
> On Tue, 20 Jul 2021 at 07:47, Javi Legido <javi at legido.com> wrote:
>
>> Hi there.
>>
>> Thanks again for your time, I really appreciate it.
>>
>> I realized that omitting '<leader>' from .marcxml file is causing the
>> issue with encoding, also in bibliographic.
>>
>> I did a little bit of reverse engineering to try to keep as simple as
>> possible my bibliographic and authority templates.
>>
>> I will restore that XML key to my templates and keep testing.
>>
>> I will post my results in this thread for the records.
>>
>> Cheers.
>>
>> Javier
>>
>> On Mon, 19 Jul 2021 at 22:20, Harald Schaefer <fechsaer at gmail.com> wrote:
>>
>>> Hi Javier,
>>>
>>> it seems that your csv text is iso885 encoded,
>>>
>>> so you needed something in your python code like
>>>
>>> utf8str = iso8859str.decode('iso-8859-1').encode('utf8')
>>>
>>> You may search the internet for
>>>
>>> python read iso8859 strings and convert them to utf8
>>>
>>> Best regards, Harald
>>>
>>> Am 19.07.21 um 19:23 schrieb Javi Legido:
>>>> Hi there.
>>>>
>>>> We are parsing a CSV file (DBText software) and convert it to MARCXML.
>>>>
>>>> Everything works fine, except for this super little detail.
>>>>
>>>> I will copy the files, since looks like the mailing list does not allows
>>>> attach them.
>>>>
>>>> Thanks again for your time.
>>>>
>>>> Javier
>>>>
>>>> authorities.1.marcxml
>>>>
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <record
>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>> xsi:schemaLocation="http://www.loc.gov/MARC21/slim
>>>> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
>>>> xmlns="http://www.loc.gov/MARC21/slim">
>>>>
>>>> <controlfield tag="001">337</controlfield>
>>>> <controlfield tag="003">OSt</controlfield>
>>>> <!-- WARNING: hardcoded -->
>>>> <datafield tag="040" ind1=" " ind2=" ">
>>>> <subfield code="a">OSt</subfield>
>>>> </datafield>
>>>> <datafield tag="100" ind1=" " ind2=" ">
>>>> <subfield code="a">VIENNEY, Claude</subfield>
>>>> </datafield>
>>>> <datafield tag="942" ind1=" " ind2=" ">
>>>> <subfield code="a">PERSO_NAME</subfield>
>>>> </datafield>
>>>> </record>
>>>> <record
>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>> xsi:schemaLocation="http://www.loc.gov/MARC21/slim
>>>> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
>>>> xmlns="http://www.loc.gov/MARC21/slim">
>>>>
>>>> <controlfield tag="001">338</controlfield>
>>>> <controlfield tag="003">OSt</controlfield>
>>>> <!-- WARNING: hardcoded -->
>>>> <datafield tag="040" ind1=" " ind2=" ">
>>>> <subfield code="a">OSt</subfield>
>>>> </datafield>
>>>> <datafield tag="150" ind1=" " ind2=" ">
>>>> <subfield code="a">Formació cooperativa</subfield>
>>>> </datafield>
>>>> <datafield tag="942" ind1=" " ind2=" ">
>>>> <subfield code="a">TOPIC_TERM</subfield>
>>>> </datafield>
>>>> </record>
>>>> <record
>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>> xsi:schemaLocation="http://www.loc.gov/MARC21/slim
>>>> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
>>>> xmlns="http://www.loc.gov/MARC21/slim">
>>>>
>>>> <controlfield tag="001">338</controlfield>
>>>> <controlfield tag="003">OSt</controlfield>
>>>> <!-- WARNING: hardcoded -->
>>>> <datafield tag="040" ind1=" " ind2=" ">
>>>> <subfield code="a">OSt</subfield>
>>>> </datafield>
>>>> <datafield tag="151" ind1=" " ind2=" ">
>>>> <subfield code="a">França</subfield>
>>>> </datafield>
>>>> <datafield tag="942" ind1=" " ind2=" ">
>>>> <subfield code="a">GEOGR_NAME</subfield>
>>>> </datafield>
>>>> </record>
>>>>
>>>> =====
>>>>
>>>> bibliographic.marcxml
>>>>
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <record
>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>> xsi:schemaLocation="http://www.loc.gov/MARC21/slim
>>>> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
>>>> xmlns="http://www.loc.gov/MARC21/slim">
>>>>
>>>> <controlfield tag="003">OSt</controlfield>
>>>>
>>>> <datafield tag="020" ind1=" " ind2=" ">
>>>> <subfield code="a">2.903819.03.3</subfield>
>>>> </datafield>
>>>> <datafield tag="024" ind1=" " ind2=" ">
>>>> <subfield code="q">199900562</subfield>
>>>> </datafield>
>>>> <datafield tag="040" ind1=" " ind2=" ">
>>>> <!-- HARDCODED. Looks like some kind of library name -->
>>>> <subfield code="c">esbafrg</subfield>
>>>> </datafield>
>>>>
>>>> <datafield tag="041" ind1="1" ind2=" ">
>>>> <subfield code="a">Francesa</subfield>
>>>> </datafield>
>>>> <datafield tag="080" ind1=" " ind2=" ">
>>>> <subfield code="a">c.0.9.3.4 | c.1.1</subfield>
>>>> </datafield>
>>>> <datafield tag="082" ind1=" " ind2=" ">
>>>> <subfield code="b">199900562</subfield>
>>>> </datafield>
>>>> <datafield tag="100" ind1=" " ind2=" ">
>>>> <subfield code="9">337</subfield>
>>>> <subfield code="a">VIENNEY, Claude</subfield>
>>>> </datafield>
>>>> <datafield tag="245" ind1=" " ind2=" ">
>>>> <subfield code="a">Socio-economie des organisations
>>>> cooperatives</subfield>
>>>> </datafield>
>>>>
>>>> <datafield tag="260" ind1=" " ind2=" ">
>>>> <subfield code="a">París</subfield>
>>>> <subfield code="b">CIEM</subfield>
>>>> <subfield code="c">1982</subfield>
>>>> </datafield>
>>>> <datafield tag="300" ind1=" " ind2=" ">
>>>> <subfield code="a">333 p.24 cm.</subfield>
>>>> <subfield code="b">Tomo II. Analyse comparèe des cooperatives
>>>> fonctionnant dans des sy</subfield>
>>>> </datafield>
>>>> <datafield tag="650" ind1="1" ind2=" ">
>>>> <subfield code="a">Formació cooperativa</subfield>
>>>> </datafield>
>>>> <datafield tag="651" ind1="1" ind2=" ">
>>>> <subfield code="a">França</subfield>
>>>> </datafield>
>>>> <datafield tag="942" ind1=" " ind2=" ">
>>>> <subfield code="2">ddc</subfield>
>>>> <!-- WARNING: Should be manually created first:
>>>> https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl -->
>>>> <subfield code="c"></subfield>
>>>> </datafield>
>>>>
>>>>
>>>> <datafield tag="952" ind1=" " ind2=" ">
>>>> <!-- HARDCODED. Librarian card number, created during initial koha
>>>> install -->
>>>> <subfield code="a">1234567890</subfield>
>>>> <!-- HARDCODED. Librarian card number, created during initial koha
>>>> install -->
>>>> <subfield code="b">1234567890</subfield>
>>>> <subfield code="o">AB-125</subfield>
>>>> <!-- WARNING: Should be manually created first:
>>>> https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl -->
>>>> <subfield code="y"></subfield>
>>>> </datafield>
>>>>
>>>>
>>>> </record>
>>>>
>>>> On Mon, 19 Jul 2021 at 18:48, Harald Schaefer <fechsaer at gmail.com>
>>> wrote:
>>>>> Hi Javier,
>>>>>
>>>>> you must be careful when working with utf8.
>>>>>
>>>>> When the inputfile and your python script is encoded in utf8, you need
>>>>> in my view no encode command.
>>>>>
>>>>> I didn't not understand your python script. It reads somethng and then
>>>>> writes again a modified xml file?
>>>>>
>>>>> There was no attachment in the last mail
>>>>>
>>>>> Regards, Harald
>>>>>
>>>>> Am 19.07.21 um 18:12 schrieb Javi Legido:
>>>>>> Hi Harold.
>>>>>>
>>>>>> Many thanks for your quick reply.
>>>>>>
>>>>>> Changing encoding:
>>>>>>
>>>>>> - return string.strip().encode("ascii",
>>>>>> "xmlcharrefreplace").decode("ascii")
>>>>>> + return string.strip().encode("utf8",
>>>>>> "xmlcharrefreplace").decode("utf8")
>>>>>>
>>>>>> Produces a MARCXML file which produces "0 records in file", so I can't
>>>>>> import it. The string was:
>>>>>>
>>>>>> França
>>>>>>
>>>>>> Attached the MARCXML record for authorities and bibliographic which
>>> works
>>>>>> (meaning that can be imported) but only for authorities produces the
>>>>> wrong
>>>>>> encoding.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Javier
>>>>>>
>>>>>> On Mon, 19 Jul 2021 at 17:26, Harald Schaefer <fechsaer at gmail.com>
>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> you should use the utf8 encoding, when creating a python file.
>>>>>>>
>>>>>>> The marcxml file should have in the first line encoding='UTF-8'
>>>>>>>
>>>>>>> In python you should use encode('utf8')
>>>>>>>
>>>>>>> Regards, Harald
>>>>>>>
>>>>>>> Am 19.07.21 um 16:10 schrieb Javi Legido:
>>>>>>>> Hi there.
>>>>>>>>
>>>>>>>> I'm trying to import an authority type 'GEOGR_NAME' with 'ç' in its
>>>>> name
>>>>>>>> (field '151 a'):
>>>>>>>>
>>>>>>>> França
>>>>>>>>
>>>>>>>> So far:
>>>>>>>>
>>>>>>>> 1. If I manually add it from GUI (I want to import it from .marcxml
>>>>> file)
>>>>>>>> it works typing 'ç' character. If I save the record as MARCXML I get
>>>>>>> below
>>>>>>>> encoding:
>>>>>>>>
>>>>>>>> <subfield code="a">França</subfield>
>>>>>>>>
>>>>>>>> 2. If I use python to encode it:
>>>>>>>>
>>>>>>>> return string.strip().encode("ascii",
>>>>>>>> "xmlcharrefreplace").decode("ascii")
>>>>>>>>
>>>>>>>> The generated MARCXML line looks like:
>>>>>>>>
>>>>>>>> <subfield code="a">França</subfield>
>>>>>>>>
>>>>>>>> In the GUI looks like 'Franȧ', and if I save it as MARCXML looks
>>> like:
>>>>>>>> <subfield code="a">Franȧ</subfield>
>>>>>>>>
>>>>>>>> Worth mentioning that the bibliographic bit referencing this
>>> authority
>>>>>>>> looks perfect, and it was created exactly the same as for
>>> authority, so
>>>>>>> the
>>>>>>>> only problem is with authority.
>>>>>>>>
>>>>>>>> Does anybody faced similar problem before? In other words I need to
>>>>>>>> generate programatically a MARCXML file to later on import it to
>>> koha
>>>>>>>> (21.x), and some of the records (authorities) contains 'ç' and are
>>> not
>>>>>>>> being encoded right.
>>>>>>>> _______________________________________________
>>>>>>>>
>>>>>>>> Koha mailing list http://koha-community.org
>>>>>>>> Koha at lists.katipo.co.nz
>>>>>>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>>>>>>> _______________________________________________
>>>>>>>
>>>>>>> Koha mailing list http://koha-community.org
>>>>>>> Koha at lists.katipo.co.nz
>>>>>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>>>>>>>
>>>>>> _______________________________________________
>>>>>>
>>>>>> Koha mailing list http://koha-community.org
>>>>>> Koha at lists.katipo.co.nz
>>>>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>>>>> _______________________________________________
>>>>>
>>>>> Koha mailing list http://koha-community.org
>>>>> Koha at lists.katipo.co.nz
>>>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>>>>>
>>>> _______________________________________________
>>>>
>>>> Koha mailing list http://koha-community.org
>>>> Koha at lists.katipo.co.nz
>>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>>> _______________________________________________
>>>
>>> Koha mailing list http://koha-community.org
>>> Koha at lists.katipo.co.nz
>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>>>
> _______________________________________________
>
> Koha mailing list http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
More information about the Koha
mailing list