[Koha] Encode 'ç' to import authority .marcxml file with authority

Javi Legido javi at legido.com
Tue Jul 20 18:51:28 NZST 2021


Hi there.

I can confirm that re-adding '<leader>' fixed the issue.

Snippet from my templates:

a) Bibliographic

  <!-- WARNING: hardcoded. Without below key encoding of non-ASCII
characters will fail -->
  <leader>00276nam a22001097a 4500</leader>

b) Authorities

  <!-- WARNING: hardcoded. Without below key encoding of non-ASCII
characters will fail -->
  {% if authority_942_a == "PERSO_NAME" -%}
  <leader>00249nz  a2200109n  4500</leader>
  {% elif authority_942_a == "GEOGR_NAME" -%}
  <leader>00237nz  a2200109n  4500</leader>
  {% elif authority_942_a == "TOPIC_TERM" -%}
  <leader>00221nz  a2200109n  4500</leader>
  {% endif -%}

And in the other hand, but this is something a little bit of the scope of
this thread, I also introduced below changes in python code:

1. The output .marcxml file now is 'UTF-8' encoded:

        with open(marcxml_filename, "w", encoding='utf8') as f:
            f.write(self.marcxml_content)

Test:

file -i /tmp/authorities.1.marcxml
/tmp/authorities.1.marcxml: text/xml; charset=utf-8

2. Since 1) I no longer need to add the 'xmlcharrefreplace' hocus pocus:

        return string.strip()

I still need to figure out '<leader>' format, since from now is hardcoded
in my templates.

Cheers.

Javier





On Tue, 20 Jul 2021 at 07:47, Javi Legido <javi at legido.com> wrote:

> Hi there.
>
> Thanks again for your time, I really appreciate it.
>
> I realized that omitting '<leader>' from .marcxml file is causing the
> issue with encoding, also in bibliographic.
>
> I did a little bit of reverse engineering to try to keep as simple as
> possible my bibliographic and authority templates.
>
> I will restore that XML key to my templates and keep testing.
>
> I will post my results in this thread for the records.
>
> Cheers.
>
> Javier
>
> On Mon, 19 Jul 2021 at 22:20, Harald Schaefer <fechsaer at gmail.com> wrote:
>
>> Hi Javier,
>>
>> it seems that your csv text is iso885 encoded,
>>
>> so you needed something in your python code like
>>
>>    utf8str = iso8859str.decode('iso-8859-1').encode('utf8')
>>
>> You may search the internet for
>>
>>    python read iso8859 strings and convert them to utf8
>>
>> Best regards, Harald
>>
>> Am 19.07.21 um 19:23 schrieb Javi Legido:
>> > Hi there.
>> >
>> > We are parsing a CSV file (DBText software) and convert it to MARCXML.
>> >
>> > Everything works fine, except for this super little detail.
>> >
>> > I will copy the files, since looks like the mailing list does not allows
>> > attach them.
>> >
>> > Thanks again for your time.
>> >
>> > Javier
>> >
>> > authorities.1.marcxml
>> >
>> > <?xml version="1.0" encoding="UTF-8"?>
>> > <record
>> >      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> >      xsi:schemaLocation="http://www.loc.gov/MARC21/slim
>> > http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
>> >      xmlns="http://www.loc.gov/MARC21/slim">
>> >
>> >    <controlfield tag="001">337</controlfield>
>> >    <controlfield tag="003">OSt</controlfield>
>> >    <!-- WARNING: hardcoded -->
>> >    <datafield tag="040" ind1=" " ind2=" ">
>> >      <subfield code="a">OSt</subfield>
>> >    </datafield>
>> >    <datafield tag="100" ind1=" " ind2=" ">
>> >      <subfield code="a">VIENNEY,  Claude</subfield>
>> >    </datafield>
>> >    <datafield tag="942" ind1=" " ind2=" ">
>> >      <subfield code="a">PERSO_NAME</subfield>
>> >    </datafield>
>> > </record>
>> > <record
>> >      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> >      xsi:schemaLocation="http://www.loc.gov/MARC21/slim
>> > http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
>> >      xmlns="http://www.loc.gov/MARC21/slim">
>> >
>> >    <controlfield tag="001">338</controlfield>
>> >    <controlfield tag="003">OSt</controlfield>
>> >    <!-- WARNING: hardcoded -->
>> >    <datafield tag="040" ind1=" " ind2=" ">
>> >      <subfield code="a">OSt</subfield>
>> >    </datafield>
>> >    <datafield tag="150" ind1=" " ind2=" ">
>> >      <subfield code="a">Formació cooperativa</subfield>
>> >    </datafield>
>> >    <datafield tag="942" ind1=" " ind2=" ">
>> >      <subfield code="a">TOPIC_TERM</subfield>
>> >    </datafield>
>> > </record>
>> > <record
>> >      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> >      xsi:schemaLocation="http://www.loc.gov/MARC21/slim
>> > http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
>> >      xmlns="http://www.loc.gov/MARC21/slim">
>> >
>> >    <controlfield tag="001">338</controlfield>
>> >    <controlfield tag="003">OSt</controlfield>
>> >    <!-- WARNING: hardcoded -->
>> >    <datafield tag="040" ind1=" " ind2=" ">
>> >      <subfield code="a">OSt</subfield>
>> >    </datafield>
>> >    <datafield tag="151" ind1=" " ind2=" ">
>> >      <subfield code="a">França</subfield>
>> >    </datafield>
>> >    <datafield tag="942" ind1=" " ind2=" ">
>> >      <subfield code="a">GEOGR_NAME</subfield>
>> >    </datafield>
>> > </record>
>> >
>> > =====
>> >
>> > bibliographic.marcxml
>> >
>> > <?xml version="1.0" encoding="UTF-8"?>
>> > <record
>> >      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> >      xsi:schemaLocation="http://www.loc.gov/MARC21/slim
>> > http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
>> >      xmlns="http://www.loc.gov/MARC21/slim">
>> >
>> >    <controlfield tag="003">OSt</controlfield>
>> >
>> >    <datafield tag="020" ind1=" " ind2=" ">
>> >      <subfield code="a">2.903819.03.3</subfield>
>> >    </datafield>
>> >    <datafield tag="024" ind1=" " ind2=" ">
>> >      <subfield code="q">199900562</subfield>
>> >    </datafield>
>> >    <datafield tag="040" ind1=" " ind2=" ">
>> >      <!-- HARDCODED. Looks like some kind of library name -->
>> >      <subfield code="c">esbafrg</subfield>
>> >    </datafield>
>> >
>> >    <datafield tag="041" ind1="1" ind2=" ">
>> >      <subfield code="a">Francesa</subfield>
>> >      </datafield>
>> >    <datafield tag="080" ind1=" " ind2=" ">
>> >      <subfield code="a">c.0.9.3.4 |  c.1.1</subfield>
>> >    </datafield>
>> >    <datafield tag="082" ind1=" " ind2=" ">
>> >      <subfield code="b">199900562</subfield>
>> >    </datafield>
>> >    <datafield tag="100" ind1=" " ind2=" ">
>> >    <subfield code="9">337</subfield>
>> >    <subfield code="a">VIENNEY,  Claude</subfield>
>> >    </datafield>
>> >    <datafield tag="245" ind1=" " ind2=" ">
>> >      <subfield code="a">Socio-economie des organisations
>> > cooperatives</subfield>
>> >    </datafield>
>> >
>> >    <datafield tag="260" ind1=" " ind2=" ">
>> >    <subfield code="a">París</subfield>
>> >    <subfield code="b">CIEM</subfield>
>> >    <subfield code="c">1982</subfield>
>> >    </datafield>
>> >    <datafield tag="300" ind1=" " ind2=" ">
>> >    <subfield code="a">333 p.24 cm.</subfield>
>> >    <subfield code="b">Tomo II. Analyse comparèe des cooperatives
>> > fonctionnant dans des sy</subfield>
>> >    </datafield>
>> >    <datafield tag="650" ind1="1" ind2=" ">
>> >      <subfield code="a">Formació cooperativa</subfield>
>> >    </datafield>
>> >    <datafield tag="651" ind1="1" ind2=" ">
>> >      <subfield code="a">França</subfield>
>> >    </datafield>
>> >    <datafield tag="942" ind1=" " ind2=" ">
>> >      <subfield code="2">ddc</subfield>
>> >      <!-- WARNING: Should be manually created first:
>> >      https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl -->
>> >      <subfield code="c"></subfield>
>> >    </datafield>
>> >
>> >
>> >    <datafield tag="952" ind1=" " ind2=" ">
>> >      <!-- HARDCODED. Librarian card number, created during initial koha
>> > install -->
>> >      <subfield code="a">1234567890</subfield>
>> >      <!-- HARDCODED. Librarian card number, created during initial koha
>> > install -->
>> >      <subfield code="b">1234567890</subfield>
>> >      <subfield code="o">AB-125</subfield>
>> >    <!-- WARNING: Should be manually created first:
>> >      https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl -->
>> >      <subfield code="y"></subfield>
>> >    </datafield>
>> >
>> >
>> > </record>
>> >
>> > On Mon, 19 Jul 2021 at 18:48, Harald Schaefer <fechsaer at gmail.com>
>> wrote:
>> >
>> >> Hi Javier,
>> >>
>> >> you must be careful when working with utf8.
>> >>
>> >> When the inputfile and your python script is encoded in utf8, you need
>> >> in my view no encode command.
>> >>
>> >> I didn't not understand your python script. It reads somethng and then
>> >> writes again a modified xml file?
>> >>
>> >> There was no attachment in the last mail
>> >>
>> >> Regards, Harald
>> >>
>> >> Am 19.07.21 um 18:12 schrieb Javi Legido:
>> >>> Hi Harold.
>> >>>
>> >>> Many thanks for your quick reply.
>> >>>
>> >>> Changing encoding:
>> >>>
>> >>> -        return string.strip().encode("ascii",
>> >>> "xmlcharrefreplace").decode("ascii")
>> >>> +        return string.strip().encode("utf8",
>> >>> "xmlcharrefreplace").decode("utf8")
>> >>>
>> >>> Produces a MARCXML file which produces "0 records in file", so I can't
>> >>> import it. The string was:
>> >>>
>> >>> França
>> >>>
>> >>> Attached the MARCXML record for authorities and bibliographic which
>> works
>> >>> (meaning that can be imported) but only for authorities produces the
>> >> wrong
>> >>> encoding.
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Javier
>> >>>
>> >>> On Mon, 19 Jul 2021 at 17:26, Harald Schaefer <fechsaer at gmail.com>
>> >> wrote:
>> >>>> Hi,
>> >>>>
>> >>>> you should use the utf8 encoding, when creating a python file.
>> >>>>
>> >>>> The marcxml file should have in the first line encoding='UTF-8'
>> >>>>
>> >>>> In python you should use encode('utf8')
>> >>>>
>> >>>> Regards, Harald
>> >>>>
>> >>>> Am 19.07.21 um 16:10 schrieb Javi Legido:
>> >>>>> Hi there.
>> >>>>>
>> >>>>> I'm trying to import an authority type 'GEOGR_NAME' with 'ç' in its
>> >> name
>> >>>>> (field '151 a'):
>> >>>>>
>> >>>>> França
>> >>>>>
>> >>>>> So far:
>> >>>>>
>> >>>>> 1. If I manually add it from GUI (I want to import it from .marcxml
>> >> file)
>> >>>>> it works typing 'ç' character. If I save the record as MARCXML I get
>> >>>> below
>> >>>>> encoding:
>> >>>>>
>> >>>>>        <subfield code="a">Fran&#xE7;a</subfield>
>> >>>>>
>> >>>>> 2. If I use python to encode it:
>> >>>>>
>> >>>>>            return string.strip().encode("ascii",
>> >>>>> "xmlcharrefreplace").decode("ascii")
>> >>>>>
>> >>>>> The generated MARCXML line looks like:
>> >>>>>
>> >>>>>        <subfield code="a">França</subfield>
>> >>>>>
>> >>>>> In the GUI looks like 'Franȧ', and if I save it as MARCXML looks
>> like:
>> >>>>>
>> >>>>>        <subfield code="a">Fran&#x227;</subfield>
>> >>>>>
>> >>>>> Worth mentioning that the bibliographic bit referencing this
>> authority
>> >>>>> looks perfect, and it was created exactly the same as for
>> authority, so
>> >>>> the
>> >>>>> only problem is with authority.
>> >>>>>
>> >>>>> Does anybody faced similar problem before? In other words I need to
>> >>>>> generate programatically a MARCXML file to later on import it to
>> koha
>> >>>>> (21.x), and some of the records (authorities) contains 'ç' and are
>> not
>> >>>>> being encoded right.
>> >>>>> _______________________________________________
>> >>>>>
>> >>>>> Koha mailing list  http://koha-community.org
>> >>>>> Koha at lists.katipo.co.nz
>> >>>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>> >>>> _______________________________________________
>> >>>>
>> >>>> Koha mailing list  http://koha-community.org
>> >>>> Koha at lists.katipo.co.nz
>> >>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>> >>>>
>> >>> _______________________________________________
>> >>>
>> >>> Koha mailing list  http://koha-community.org
>> >>> Koha at lists.katipo.co.nz
>> >>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>> >> _______________________________________________
>> >>
>> >> Koha mailing list  http://koha-community.org
>> >> Koha at lists.katipo.co.nz
>> >> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>> >>
>> > _______________________________________________
>> >
>> > Koha mailing list  http://koha-community.org
>> > Koha at lists.katipo.co.nz
>> > Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>> _______________________________________________
>>
>> Koha mailing list  http://koha-community.org
>> Koha at lists.katipo.co.nz
>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>>
>


More information about the Koha mailing list