[Koha] Encode 'ç' to import authority .marcxml file with authority

Javi Legido javi at legido.com
Tue Jul 20 17:47:28 NZST 2021


Hi there.

Thanks again for your time, I really appreciate it.

I realized that omitting '<leader>' from .marcxml file is causing the issue
with encoding, also in bibliographic.

I did a little bit of reverse engineering to try to keep as simple as
possible my bibliographic and authority templates.

I will restore that XML key to my templates and keep testing.

I will post my results in this thread for the records.

Cheers.

Javier

On Mon, 19 Jul 2021 at 22:20, Harald Schaefer <fechsaer at gmail.com> wrote:

> Hi Javier,
>
> it seems that your csv text is iso885 encoded,
>
> so you needed something in your python code like
>
>    utf8str = iso8859str.decode('iso-8859-1').encode('utf8')
>
> You may search the internet for
>
>    python read iso8859 strings and convert them to utf8
>
> Best regards, Harald
>
> Am 19.07.21 um 19:23 schrieb Javi Legido:
> > Hi there.
> >
> > We are parsing a CSV file (DBText software) and convert it to MARCXML.
> >
> > Everything works fine, except for this super little detail.
> >
> > I will copy the files, since looks like the mailing list does not allows
> > attach them.
> >
> > Thanks again for your time.
> >
> > Javier
> >
> > authorities.1.marcxml
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <record
> >      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> >      xsi:schemaLocation="http://www.loc.gov/MARC21/slim
> > http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
> >      xmlns="http://www.loc.gov/MARC21/slim">
> >
> >    <controlfield tag="001">337</controlfield>
> >    <controlfield tag="003">OSt</controlfield>
> >    <!-- WARNING: hardcoded -->
> >    <datafield tag="040" ind1=" " ind2=" ">
> >      <subfield code="a">OSt</subfield>
> >    </datafield>
> >    <datafield tag="100" ind1=" " ind2=" ">
> >      <subfield code="a">VIENNEY,  Claude</subfield>
> >    </datafield>
> >    <datafield tag="942" ind1=" " ind2=" ">
> >      <subfield code="a">PERSO_NAME</subfield>
> >    </datafield>
> > </record>
> > <record
> >      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> >      xsi:schemaLocation="http://www.loc.gov/MARC21/slim
> > http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
> >      xmlns="http://www.loc.gov/MARC21/slim">
> >
> >    <controlfield tag="001">338</controlfield>
> >    <controlfield tag="003">OSt</controlfield>
> >    <!-- WARNING: hardcoded -->
> >    <datafield tag="040" ind1=" " ind2=" ">
> >      <subfield code="a">OSt</subfield>
> >    </datafield>
> >    <datafield tag="150" ind1=" " ind2=" ">
> >      <subfield code="a">Formació cooperativa</subfield>
> >    </datafield>
> >    <datafield tag="942" ind1=" " ind2=" ">
> >      <subfield code="a">TOPIC_TERM</subfield>
> >    </datafield>
> > </record>
> > <record
> >      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> >      xsi:schemaLocation="http://www.loc.gov/MARC21/slim
> > http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
> >      xmlns="http://www.loc.gov/MARC21/slim">
> >
> >    <controlfield tag="001">338</controlfield>
> >    <controlfield tag="003">OSt</controlfield>
> >    <!-- WARNING: hardcoded -->
> >    <datafield tag="040" ind1=" " ind2=" ">
> >      <subfield code="a">OSt</subfield>
> >    </datafield>
> >    <datafield tag="151" ind1=" " ind2=" ">
> >      <subfield code="a">França</subfield>
> >    </datafield>
> >    <datafield tag="942" ind1=" " ind2=" ">
> >      <subfield code="a">GEOGR_NAME</subfield>
> >    </datafield>
> > </record>
> >
> > =====
> >
> > bibliographic.marcxml
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <record
> >      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> >      xsi:schemaLocation="http://www.loc.gov/MARC21/slim
> > http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
> >      xmlns="http://www.loc.gov/MARC21/slim">
> >
> >    <controlfield tag="003">OSt</controlfield>
> >
> >    <datafield tag="020" ind1=" " ind2=" ">
> >      <subfield code="a">2.903819.03.3</subfield>
> >    </datafield>
> >    <datafield tag="024" ind1=" " ind2=" ">
> >      <subfield code="q">199900562</subfield>
> >    </datafield>
> >    <datafield tag="040" ind1=" " ind2=" ">
> >      <!-- HARDCODED. Looks like some kind of library name -->
> >      <subfield code="c">esbafrg</subfield>
> >    </datafield>
> >
> >    <datafield tag="041" ind1="1" ind2=" ">
> >      <subfield code="a">Francesa</subfield>
> >      </datafield>
> >    <datafield tag="080" ind1=" " ind2=" ">
> >      <subfield code="a">c.0.9.3.4 |  c.1.1</subfield>
> >    </datafield>
> >    <datafield tag="082" ind1=" " ind2=" ">
> >      <subfield code="b">199900562</subfield>
> >    </datafield>
> >    <datafield tag="100" ind1=" " ind2=" ">
> >    <subfield code="9">337</subfield>
> >    <subfield code="a">VIENNEY,  Claude</subfield>
> >    </datafield>
> >    <datafield tag="245" ind1=" " ind2=" ">
> >      <subfield code="a">Socio-economie des organisations
> > cooperatives</subfield>
> >    </datafield>
> >
> >    <datafield tag="260" ind1=" " ind2=" ">
> >    <subfield code="a">París</subfield>
> >    <subfield code="b">CIEM</subfield>
> >    <subfield code="c">1982</subfield>
> >    </datafield>
> >    <datafield tag="300" ind1=" " ind2=" ">
> >    <subfield code="a">333 p.24 cm.</subfield>
> >    <subfield code="b">Tomo II. Analyse comparèe des cooperatives
> > fonctionnant dans des sy</subfield>
> >    </datafield>
> >    <datafield tag="650" ind1="1" ind2=" ">
> >      <subfield code="a">Formació cooperativa</subfield>
> >    </datafield>
> >    <datafield tag="651" ind1="1" ind2=" ">
> >      <subfield code="a">França</subfield>
> >    </datafield>
> >    <datafield tag="942" ind1=" " ind2=" ">
> >      <subfield code="2">ddc</subfield>
> >      <!-- WARNING: Should be manually created first:
> >      https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl -->
> >      <subfield code="c"></subfield>
> >    </datafield>
> >
> >
> >    <datafield tag="952" ind1=" " ind2=" ">
> >      <!-- HARDCODED. Librarian card number, created during initial koha
> > install -->
> >      <subfield code="a">1234567890</subfield>
> >      <!-- HARDCODED. Librarian card number, created during initial koha
> > install -->
> >      <subfield code="b">1234567890</subfield>
> >      <subfield code="o">AB-125</subfield>
> >    <!-- WARNING: Should be manually created first:
> >      https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl -->
> >      <subfield code="y"></subfield>
> >    </datafield>
> >
> >
> > </record>
> >
> > On Mon, 19 Jul 2021 at 18:48, Harald Schaefer <fechsaer at gmail.com>
> wrote:
> >
> >> Hi Javier,
> >>
> >> you must be careful when working with utf8.
> >>
> >> When the inputfile and your python script is encoded in utf8, you need
> >> in my view no encode command.
> >>
> >> I didn't not understand your python script. It reads somethng and then
> >> writes again a modified xml file?
> >>
> >> There was no attachment in the last mail
> >>
> >> Regards, Harald
> >>
> >> Am 19.07.21 um 18:12 schrieb Javi Legido:
> >>> Hi Harold.
> >>>
> >>> Many thanks for your quick reply.
> >>>
> >>> Changing encoding:
> >>>
> >>> -        return string.strip().encode("ascii",
> >>> "xmlcharrefreplace").decode("ascii")
> >>> +        return string.strip().encode("utf8",
> >>> "xmlcharrefreplace").decode("utf8")
> >>>
> >>> Produces a MARCXML file which produces "0 records in file", so I can't
> >>> import it. The string was:
> >>>
> >>> França
> >>>
> >>> Attached the MARCXML record for authorities and bibliographic which
> works
> >>> (meaning that can be imported) but only for authorities produces the
> >> wrong
> >>> encoding.
> >>>
> >>> Thanks.
> >>>
> >>> Javier
> >>>
> >>> On Mon, 19 Jul 2021 at 17:26, Harald Schaefer <fechsaer at gmail.com>
> >> wrote:
> >>>> Hi,
> >>>>
> >>>> you should use the utf8 encoding, when creating a python file.
> >>>>
> >>>> The marcxml file should have in the first line encoding='UTF-8'
> >>>>
> >>>> In python you should use encode('utf8')
> >>>>
> >>>> Regards, Harald
> >>>>
> >>>> Am 19.07.21 um 16:10 schrieb Javi Legido:
> >>>>> Hi there.
> >>>>>
> >>>>> I'm trying to import an authority type 'GEOGR_NAME' with 'ç' in its
> >> name
> >>>>> (field '151 a'):
> >>>>>
> >>>>> França
> >>>>>
> >>>>> So far:
> >>>>>
> >>>>> 1. If I manually add it from GUI (I want to import it from .marcxml
> >> file)
> >>>>> it works typing 'ç' character. If I save the record as MARCXML I get
> >>>> below
> >>>>> encoding:
> >>>>>
> >>>>>        <subfield code="a">Fran&#xE7;a</subfield>
> >>>>>
> >>>>> 2. If I use python to encode it:
> >>>>>
> >>>>>            return string.strip().encode("ascii",
> >>>>> "xmlcharrefreplace").decode("ascii")
> >>>>>
> >>>>> The generated MARCXML line looks like:
> >>>>>
> >>>>>        <subfield code="a">França</subfield>
> >>>>>
> >>>>> In the GUI looks like 'Franȧ', and if I save it as MARCXML looks
> like:
> >>>>>
> >>>>>        <subfield code="a">Fran&#x227;</subfield>
> >>>>>
> >>>>> Worth mentioning that the bibliographic bit referencing this
> authority
> >>>>> looks perfect, and it was created exactly the same as for authority,
> so
> >>>> the
> >>>>> only problem is with authority.
> >>>>>
> >>>>> Does anybody faced similar problem before? In other words I need to
> >>>>> generate programatically a MARCXML file to later on import it to koha
> >>>>> (21.x), and some of the records (authorities) contains 'ç' and are
> not
> >>>>> being encoded right.
> >>>>> _______________________________________________
> >>>>>
> >>>>> Koha mailing list  http://koha-community.org
> >>>>> Koha at lists.katipo.co.nz
> >>>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> >>>> _______________________________________________
> >>>>
> >>>> Koha mailing list  http://koha-community.org
> >>>> Koha at lists.katipo.co.nz
> >>>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> >>>>
> >>> _______________________________________________
> >>>
> >>> Koha mailing list  http://koha-community.org
> >>> Koha at lists.katipo.co.nz
> >>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> >> _______________________________________________
> >>
> >> Koha mailing list  http://koha-community.org
> >> Koha at lists.katipo.co.nz
> >> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> >>
> > _______________________________________________
> >
> > Koha mailing list  http://koha-community.org
> > Koha at lists.katipo.co.nz
> > Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> _______________________________________________
>
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>


More information about the Koha mailing list