Hi Javier, the marc21 format and the leader field is described here https://www.loc.gov/marc/bibliographic/bdleader.html <https://www.loc.gov/marc/bibliographic/bdleader.html> Byte 9 is the encoding field Regards, Harald Am 20.07.21 um 08:51 schrieb Javi Legido:
Hi there.
I can confirm that re-adding '<leader>' fixed the issue.
Snippet from my templates:
a) Bibliographic
<!-- WARNING: hardcoded. Without below key encoding of non-ASCII characters will fail --> <leader>00276nam a22001097a 4500</leader>
b) Authorities
<!-- WARNING: hardcoded. Without below key encoding of non-ASCII characters will fail --> {% if authority_942_a == "PERSO_NAME" -%} <leader>00249nz a2200109n 4500</leader> {% elif authority_942_a == "GEOGR_NAME" -%} <leader>00237nz a2200109n 4500</leader> {% elif authority_942_a == "TOPIC_TERM" -%} <leader>00221nz a2200109n 4500</leader> {% endif -%}
And in the other hand, but this is something a little bit of the scope of this thread, I also introduced below changes in python code:
1. The output .marcxml file now is 'UTF-8' encoded:
with open(marcxml_filename, "w", encoding='utf8') as f: f.write(self.marcxml_content)
Test:
file -i /tmp/authorities.1.marcxml /tmp/authorities.1.marcxml: text/xml; charset=utf-8
2. Since 1) I no longer need to add the 'xmlcharrefreplace' hocus pocus:
return string.strip()
I still need to figure out '<leader>' format, since from now is hardcoded in my templates.
Cheers.
Javier
On Tue, 20 Jul 2021 at 07:47, Javi Legido <javi@legido.com> wrote:
Hi there.
Thanks again for your time, I really appreciate it.
I realized that omitting '<leader>' from .marcxml file is causing the issue with encoding, also in bibliographic.
I did a little bit of reverse engineering to try to keep as simple as possible my bibliographic and authority templates.
I will restore that XML key to my templates and keep testing.
I will post my results in this thread for the records.
Cheers.
Javier
On Mon, 19 Jul 2021 at 22:20, Harald Schaefer <fechsaer@gmail.com> wrote:
Hi Javier,
it seems that your csv text is iso885 encoded,
so you needed something in your python code like
utf8str = iso8859str.decode('iso-8859-1').encode('utf8')
You may search the internet for
python read iso8859 strings and convert them to utf8
Best regards, Harald
Hi there.
We are parsing a CSV file (DBText software) and convert it to MARCXML.
Everything works fine, except for this super little detail.
I will copy the files, since looks like the mailing list does not allows attach them.
Thanks again for your time.
Javier
authorities.1.marcxml
<?xml version="1.0" encoding="UTF-8"?> <record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim">
<controlfield tag="001">337</controlfield> <controlfield tag="003">OSt</controlfield> <!-- WARNING: hardcoded --> <datafield tag="040" ind1=" " ind2=" "> <subfield code="a">OSt</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">VIENNEY, Claude</subfield> </datafield> <datafield tag="942" ind1=" " ind2=" "> <subfield code="a">PERSO_NAME</subfield> </datafield> </record> <record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim">
<controlfield tag="001">338</controlfield> <controlfield tag="003">OSt</controlfield> <!-- WARNING: hardcoded --> <datafield tag="040" ind1=" " ind2=" "> <subfield code="a">OSt</subfield> </datafield> <datafield tag="150" ind1=" " ind2=" "> <subfield code="a">Formació cooperativa</subfield> </datafield> <datafield tag="942" ind1=" " ind2=" "> <subfield code="a">TOPIC_TERM</subfield> </datafield> </record> <record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim">
<controlfield tag="001">338</controlfield> <controlfield tag="003">OSt</controlfield> <!-- WARNING: hardcoded --> <datafield tag="040" ind1=" " ind2=" "> <subfield code="a">OSt</subfield> </datafield> <datafield tag="151" ind1=" " ind2=" "> <subfield code="a">França</subfield> </datafield> <datafield tag="942" ind1=" " ind2=" "> <subfield code="a">GEOGR_NAME</subfield> </datafield> </record>
=====
bibliographic.marcxml
<?xml version="1.0" encoding="UTF-8"?> <record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim">
<controlfield tag="003">OSt</controlfield>
<datafield tag="020" ind1=" " ind2=" "> <subfield code="a">2.903819.03.3</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="q">199900562</subfield> </datafield> <datafield tag="040" ind1=" " ind2=" "> <!-- HARDCODED. Looks like some kind of library name --> <subfield code="c">esbafrg</subfield> </datafield>
<datafield tag="041" ind1="1" ind2=" "> <subfield code="a">Francesa</subfield> </datafield> <datafield tag="080" ind1=" " ind2=" "> <subfield code="a">c.0.9.3.4 | c.1.1</subfield> </datafield> <datafield tag="082" ind1=" " ind2=" "> <subfield code="b">199900562</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="9">337</subfield> <subfield code="a">VIENNEY, Claude</subfield> </datafield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">Socio-economie des organisations cooperatives</subfield> </datafield>
<datafield tag="260" ind1=" " ind2=" "> <subfield code="a">París</subfield> <subfield code="b">CIEM</subfield> <subfield code="c">1982</subfield> </datafield> <datafield tag="300" ind1=" " ind2=" "> <subfield code="a">333 p.24 cm.</subfield> <subfield code="b">Tomo II. Analyse comparèe des cooperatives fonctionnant dans des sy</subfield> </datafield> <datafield tag="650" ind1="1" ind2=" "> <subfield code="a">Formació cooperativa</subfield> </datafield> <datafield tag="651" ind1="1" ind2=" "> <subfield code="a">França</subfield> </datafield> <datafield tag="942" ind1=" " ind2=" "> <subfield code="2">ddc</subfield> <!-- WARNING: Should be manually created first: https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl --> <subfield code="c"></subfield> </datafield>
<datafield tag="952" ind1=" " ind2=" "> <!-- HARDCODED. Librarian card number, created during initial koha install --> <subfield code="a">1234567890</subfield> <!-- HARDCODED. Librarian card number, created during initial koha install --> <subfield code="b">1234567890</subfield> <subfield code="o">AB-125</subfield> <!-- WARNING: Should be manually created first: https://admin.example.com/cgi-bin/koha/admin/itemtypes.pl --> <subfield code="y"></subfield> </datafield>
</record>
On Mon, 19 Jul 2021 at 18:48, Harald Schaefer <fechsaer@gmail.com> wrote:
Hi Javier,
you must be careful when working with utf8.
When the inputfile and your python script is encoded in utf8, you need in my view no encode command.
I didn't not understand your python script. It reads somethng and then writes again a modified xml file?
There was no attachment in the last mail
Regards, Harald
Am 19.07.21 um 18:12 schrieb Javi Legido:
Hi Harold.
Many thanks for your quick reply.
Changing encoding:
- return string.strip().encode("ascii", "xmlcharrefreplace").decode("ascii") + return string.strip().encode("utf8", "xmlcharrefreplace").decode("utf8")
Produces a MARCXML file which produces "0 records in file", so I can't import it. The string was:
França
Attached the MARCXML record for authorities and bibliographic which works (meaning that can be imported) but only for authorities produces the wrong encoding.
Thanks.
Javier
On Mon, 19 Jul 2021 at 17:26, Harald Schaefer <fechsaer@gmail.com> wrote: > Hi, > > you should use the utf8 encoding, when creating a python file. > > The marcxml file should have in the first line encoding='UTF-8' > > In python you should use encode('utf8') > > Regards, Harald > > Am 19.07.21 um 16:10 schrieb Javi Legido: >> Hi there. >> >> I'm trying to import an authority type 'GEOGR_NAME' with 'ç' in its name >> (field '151 a'): >> >> França >> >> So far: >> >> 1. If I manually add it from GUI (I want to import it from .marcxml file) >> it works typing 'ç' character. If I save the record as MARCXML I get > below >> encoding: >> >> <subfield code="a">França</subfield> >> >> 2. If I use python to encode it: >> >> return string.strip().encode("ascii", >> "xmlcharrefreplace").decode("ascii") >> >> The generated MARCXML line looks like: >> >> <subfield code="a">França</subfield> >> >> In the GUI looks like 'Franȧ', and if I save it as MARCXML looks
Am 19.07.21 um 19:23 schrieb Javi Legido: like:
>> <subfield code="a">Franȧ</subfield> >> >> Worth mentioning that the bibliographic bit referencing this authority >> looks perfect, and it was created exactly the same as for authority, so > the >> only problem is with authority. >> >> Does anybody faced similar problem before? In other words I need to >> generate programatically a MARCXML file to later on import it to koha >> (21.x), and some of the records (authorities) contains 'ç' and are not >> being encoded right. >> _______________________________________________ >> >> Koha mailing list http://koha-community.org >> Koha@lists.katipo.co.nz >> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha > _______________________________________________ > > Koha mailing list http://koha-community.org > Koha@lists.katipo.co.nz > Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha > _______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha