[Koha] Encode 'ç' to import authority .marcxml file with authority

Javi Legido javi at legido.com
Tue Jul 20 05:23:12 NZST 2021


Hi there.

We are parsing a CSV file (DBText software) and convert it to MARCXML.

Everything works fine, except for this super little detail.

I will copy the files, since looks like the mailing list does not allows
attach them.

Thanks again for your time.

Javier

authorities.1.marcxml

<?xml version="1.0" encoding="UTF-8"?>
<record
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
    xmlns="http://www.loc.gov/MARC21/slim">

  <controlfield tag="001">337</controlfield>
  <controlfield tag="003">OSt</controlfield>
  <!-- WARNING: hardcoded -->
  <datafield tag="040" ind1=" " ind2=" ">
    <subfield code="a">OSt</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">VIENNEY,  Claude</subfield>
  </datafield>
  <datafield tag="942" ind1=" " ind2=" ">
    <subfield code="a">PERSO_NAME</subfield>
  </datafield>
</record>
<record
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
    xmlns="http://www.loc.gov/MARC21/slim">

  <controlfield tag="001">338</controlfield>
  <controlfield tag="003">OSt</controlfield>
  <!-- WARNING: hardcoded -->
  <datafield tag="040" ind1=" " ind2=" ">
    <subfield code="a">OSt</subfield>
  </datafield>
  <datafield tag="150" ind1=" " ind2=" ">
    <subfield code="a">Formació cooperativa</subfield>
  </datafield>
  <datafield tag="942" ind1=" " ind2=" ">
    <subfield code="a">TOPIC_TERM</subfield>
  </datafield>
</record>
<record
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
    xmlns="http://www.loc.gov/MARC21/slim">

  <controlfield tag="001">338</controlfield>
  <controlfield tag="003">OSt</controlfield>
  <!-- WARNING: hardcoded -->
  <datafield tag="040" ind1=" " ind2=" ">
    <subfield code="a">OSt</subfield>
  </datafield>
  <datafield tag="151" ind1=" " ind2=" ">
    <subfield code="a">França</subfield>
  </datafield>
  <datafield tag="942" ind1=" " ind2=" ">
    <subfield code="a">GEOGR_NAME</subfield>
  </datafield>
</record>

=====

bibliographic.marcxml

<?xml version="1.0" encoding="UTF-8"?>
<record
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
    xmlns="http://www.loc.gov/MARC21/slim">

  <controlfield tag="003">OSt</controlfield>

  <datafield tag="020" ind1=" " ind2=" ">
    <subfield code="a">2.903819.03.3</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="q">199900562</subfield>
  </datafield>
  <datafield tag="040" ind1=" " ind2=" ">
    <!-- HARDCODED. Looks like some kind of library name -->
    <subfield code="c">esbafrg</subfield>
  </datafield>

  <datafield tag="041" ind1="1" ind2=" ">
    <subfield code="a">Francesa</subfield>
    </datafield>
  <datafield tag="080" ind1=" " ind2=" ">
    <subfield code="a">c.0.9.3.4 |  c.1.1</subfield>
  </datafield>
  <datafield tag="082" ind1=" " ind2=" ">
    <subfield code="b">199900562</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
  <subfield code="9">337</subfield>
  <subfield code="a">VIENNEY,  Claude</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Socio-economie des organisations
cooperatives</subfield>
  </datafield>

  <datafield tag="260" ind1=" " ind2=" ">
  <subfield code="a">París</subfield>
  <subfield code="b">CIEM</subfield>
  <subfield code="c">1982</subfield>
  </datafield>
  <datafield tag="300" ind1=" " ind2=" ">
  <subfield code="a">333 p.24 cm.</subfield>
  <subfield code="b">Tomo II. Analyse comparèe des cooperatives
fonctionnant dans des sy</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2=" ">
    <subfield code="a">Formació cooperativa</subfield>
  </datafield>
  <datafield tag="651" ind1="1" ind2=" ">
    <subfield code="a">França</subfield>
  </datafield>
  <datafield tag="942" ind1=" " ind2=" ">
    <subfield code="2">ddc</subfield>
    <!-- WARNING: Should be manually created first:
    https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl -->
    <subfield code="c"></subfield>
  </datafield>


  <datafield tag="952" ind1=" " ind2=" ">
    <!-- HARDCODED. Librarian card number, created during initial koha
install -->
    <subfield code="a">1234567890</subfield>
    <!-- HARDCODED. Librarian card number, created during initial koha
install -->
    <subfield code="b">1234567890</subfield>
    <subfield code="o">AB-125</subfield>
  <!-- WARNING: Should be manually created first:
    https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl -->
    <subfield code="y"></subfield>
  </datafield>


</record>

On Mon, 19 Jul 2021 at 18:48, Harald Schaefer <fechsaer at gmail.com> wrote:

> Hi Javier,
>
> you must be careful when working with utf8.
>
> When the inputfile and your python script is encoded in utf8, you need
> in my view no encode command.
>
> I didn't not understand your python script. It reads somethng and then
> writes again a modified xml file?
>
> There was no attachment in the last mail
>
> Regards, Harald
>
> Am 19.07.21 um 18:12 schrieb Javi Legido:
> > Hi Harold.
> >
> > Many thanks for your quick reply.
> >
> > Changing encoding:
> >
> > -        return string.strip().encode("ascii",
> > "xmlcharrefreplace").decode("ascii")
> > +        return string.strip().encode("utf8",
> > "xmlcharrefreplace").decode("utf8")
> >
> > Produces a MARCXML file which produces "0 records in file", so I can't
> > import it. The string was:
> >
> > França
> >
> > Attached the MARCXML record for authorities and bibliographic which works
> > (meaning that can be imported) but only for authorities produces the
> wrong
> > encoding.
> >
> > Thanks.
> >
> > Javier
> >
> > On Mon, 19 Jul 2021 at 17:26, Harald Schaefer <fechsaer at gmail.com>
> wrote:
> >
> >> Hi,
> >>
> >> you should use the utf8 encoding, when creating a python file.
> >>
> >> The marcxml file should have in the first line encoding='UTF-8'
> >>
> >> In python you should use encode('utf8')
> >>
> >> Regards, Harald
> >>
> >> Am 19.07.21 um 16:10 schrieb Javi Legido:
> >>> Hi there.
> >>>
> >>> I'm trying to import an authority type 'GEOGR_NAME' with 'ç' in its
> name
> >>> (field '151 a'):
> >>>
> >>> França
> >>>
> >>> So far:
> >>>
> >>> 1. If I manually add it from GUI (I want to import it from .marcxml
> file)
> >>> it works typing 'ç' character. If I save the record as MARCXML I get
> >> below
> >>> encoding:
> >>>
> >>>       <subfield code="a">Fran&#xE7;a</subfield>
> >>>
> >>> 2. If I use python to encode it:
> >>>
> >>>           return string.strip().encode("ascii",
> >>> "xmlcharrefreplace").decode("ascii")
> >>>
> >>> The generated MARCXML line looks like:
> >>>
> >>>       <subfield code="a">França</subfield>
> >>>
> >>> In the GUI looks like 'Franȧ', and if I save it as MARCXML looks like:
> >>>
> >>>       <subfield code="a">Fran&#x227;</subfield>
> >>>
> >>> Worth mentioning that the bibliographic bit referencing this authority
> >>> looks perfect, and it was created exactly the same as for authority, so
> >> the
> >>> only problem is with authority.
> >>>
> >>> Does anybody faced similar problem before? In other words I need to
> >>> generate programatically a MARCXML file to later on import it to koha
> >>> (21.x), and some of the records (authorities) contains 'ç' and are not
> >>> being encoded right.
> >>> _______________________________________________
> >>>
> >>> Koha mailing list  http://koha-community.org
> >>> Koha at lists.katipo.co.nz
> >>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> >> _______________________________________________
> >>
> >> Koha mailing list  http://koha-community.org
> >> Koha at lists.katipo.co.nz
> >> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> >>
> > _______________________________________________
> >
> > Koha mailing list  http://koha-community.org
> > Koha at lists.katipo.co.nz
> > Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> _______________________________________________
>
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>


More information about the Koha mailing list