Hi there. We are parsing a CSV file (DBText software) and convert it to MARCXML. Everything works fine, except for this super little detail. I will copy the files, since looks like the mailing list does not allows attach them. Thanks again for your time. Javier authorities.1.marcxml <?xml version="1.0" encoding="UTF-8"?> <record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim"> <controlfield tag="001">337</controlfield> <controlfield tag="003">OSt</controlfield> <!-- WARNING: hardcoded --> <datafield tag="040" ind1=" " ind2=" "> <subfield code="a">OSt</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">VIENNEY, Claude</subfield> </datafield> <datafield tag="942" ind1=" " ind2=" "> <subfield code="a">PERSO_NAME</subfield> </datafield> </record> <record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim"> <controlfield tag="001">338</controlfield> <controlfield tag="003">OSt</controlfield> <!-- WARNING: hardcoded --> <datafield tag="040" ind1=" " ind2=" "> <subfield code="a">OSt</subfield> </datafield> <datafield tag="150" ind1=" " ind2=" "> <subfield code="a">Formació cooperativa</subfield> </datafield> <datafield tag="942" ind1=" " ind2=" "> <subfield code="a">TOPIC_TERM</subfield> </datafield> </record> <record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim"> <controlfield tag="001">338</controlfield> <controlfield tag="003">OSt</controlfield> <!-- WARNING: hardcoded --> <datafield tag="040" ind1=" " ind2=" "> <subfield code="a">OSt</subfield> </datafield> <datafield tag="151" ind1=" " ind2=" "> <subfield code="a">França</subfield> </datafield> <datafield tag="942" ind1=" " ind2=" "> <subfield code="a">GEOGR_NAME</subfield> </datafield> </record> ===== bibliographic.marcxml <?xml version="1.0" encoding="UTF-8"?> <record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim"> <controlfield tag="003">OSt</controlfield> <datafield tag="020" ind1=" " ind2=" "> <subfield code="a">2.903819.03.3</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="q">199900562</subfield> </datafield> <datafield tag="040" ind1=" " ind2=" "> <!-- HARDCODED. Looks like some kind of library name --> <subfield code="c">esbafrg</subfield> </datafield> <datafield tag="041" ind1="1" ind2=" "> <subfield code="a">Francesa</subfield> </datafield> <datafield tag="080" ind1=" " ind2=" "> <subfield code="a">c.0.9.3.4 | c.1.1</subfield> </datafield> <datafield tag="082" ind1=" " ind2=" "> <subfield code="b">199900562</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="9">337</subfield> <subfield code="a">VIENNEY, Claude</subfield> </datafield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">Socio-economie des organisations cooperatives</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="a">París</subfield> <subfield code="b">CIEM</subfield> <subfield code="c">1982</subfield> </datafield> <datafield tag="300" ind1=" " ind2=" "> <subfield code="a">333 p.24 cm.</subfield> <subfield code="b">Tomo II. Analyse comparèe des cooperatives fonctionnant dans des sy</subfield> </datafield> <datafield tag="650" ind1="1" ind2=" "> <subfield code="a">Formació cooperativa</subfield> </datafield> <datafield tag="651" ind1="1" ind2=" "> <subfield code="a">França</subfield> </datafield> <datafield tag="942" ind1=" " ind2=" "> <subfield code="2">ddc</subfield> <!-- WARNING: Should be manually created first: https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl --> <subfield code="c"></subfield> </datafield> <datafield tag="952" ind1=" " ind2=" "> <!-- HARDCODED. Librarian card number, created during initial koha install --> <subfield code="a">1234567890</subfield> <!-- HARDCODED. Librarian card number, created during initial koha install --> <subfield code="b">1234567890</subfield> <subfield code="o">AB-125</subfield> <!-- WARNING: Should be manually created first: https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl --> <subfield code="y"></subfield> </datafield> </record> On Mon, 19 Jul 2021 at 18:48, Harald Schaefer <fechsaer@gmail.com> wrote:
Hi Javier,
you must be careful when working with utf8.
When the inputfile and your python script is encoded in utf8, you need in my view no encode command.
I didn't not understand your python script. It reads somethng and then writes again a modified xml file?
There was no attachment in the last mail
Regards, Harald
Am 19.07.21 um 18:12 schrieb Javi Legido:
Hi Harold.
Many thanks for your quick reply.
Changing encoding:
- return string.strip().encode("ascii", "xmlcharrefreplace").decode("ascii") + return string.strip().encode("utf8", "xmlcharrefreplace").decode("utf8")
Produces a MARCXML file which produces "0 records in file", so I can't import it. The string was:
França
Attached the MARCXML record for authorities and bibliographic which works (meaning that can be imported) but only for authorities produces the wrong encoding.
Thanks.
Javier
On Mon, 19 Jul 2021 at 17:26, Harald Schaefer <fechsaer@gmail.com> wrote:
Hi,
you should use the utf8 encoding, when creating a python file.
The marcxml file should have in the first line encoding='UTF-8'
In python you should use encode('utf8')
Regards, Harald
Hi there.
I'm trying to import an authority type 'GEOGR_NAME' with 'ç' in its name (field '151 a'):
França
So far:
1. If I manually add it from GUI (I want to import it from .marcxml file) it works typing 'ç' character. If I save the record as MARCXML I get below encoding:
<subfield code="a">França</subfield>
2. If I use python to encode it:
return string.strip().encode("ascii", "xmlcharrefreplace").decode("ascii")
The generated MARCXML line looks like:
<subfield code="a">França</subfield>
In the GUI looks like 'Franȧ', and if I save it as MARCXML looks like:
<subfield code="a">Franȧ</subfield>
Worth mentioning that the bibliographic bit referencing this authority looks perfect, and it was created exactly the same as for authority, so
Am 19.07.21 um 16:10 schrieb Javi Legido: the
only problem is with authority.
Does anybody faced similar problem before? In other words I need to generate programatically a MARCXML file to later on import it to koha (21.x), and some of the records (authorities) contains 'ç' and are not being encoded right. _______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha