[Koha] Marc-->CSV - alternative to MarcEdit?

Robin Sheat robin at catalyst.net.nz
Mon Feb 28 18:37:38 NZDT 2011


hansbkk at gmail.com schreef op ma 28-02-2011 om 12:15 [+0700]:
> I'm trying to export batches of records from MARC to a delimited text
> format, do a bunch of editing and then convert back to MARC.

The CSV format is not capable of holding the information contained
within MARC. MARC has two levels of repeatable headers (field, subfield)
where order and grouping matters whereas CSV has one level of
(generally) non-repeatable headers with fixed order and no concept of
grouping.

> MarcEdit works great, except when going from MARC to CSV, the
> repeating fields (5XX and up) are being concatenated with semicolon
> separators into single fields. I'd like to keep these as separate
> repeating fields, 

Splitting with tokens, like semicolons, is pretty much the only way to
begin, but unless you're extremely careful you're still going to lose
information. The only way to be careful is by encoding a whole lot of
extra information into your CSV files, at which point they stop being
CSV, for all intents and purposes.

> so I'm working on kludging up some workarounds, but
> in the meantime can anyone suggest an alternative tool, ideally one
> that will do a clean round-trip, i.e. reconstruct the MARC from the
> CSV identically if the data isn't altered?

That is pretty much an impossible task.

> Chris I'm cc'ing you specifically because I recall your mentioning a
> script in a past thread that went the other way (CSV to MARC), but
> unfortunately couldn't find the message in the archives. 

I have a script I call csvtomarc.pl. It's designed for taking the output
from something that can only export in a CSV-like form[0], passing it
through a host of rules and transformations, linking up items, and
outputting MARC. It's what I use to do migrations, generally.

The most up-to-date version of this at the moment can be found here:
http://git.catalyst.net.nz/gw?p=koha.git;a=tree;f=import/csv;h=a92d26d020e04b6a21de328f307c53a41965ff80;hb=refs/heads/stdc_import
but I warn you, it's not designed as an end-user tool, it's pretty
complex.

[0] No product I've encountered yet can export as proper CSV it seems,
they all break the spec and produce unparseable results that require
hand massaging to fix. I'm especially looking at you, Liberty.
-- 
Robin Sheat
Catalyst IT Ltd.
✆ +64 4 803 2204
GPG: 5957 6D23 8B16 EFAB FEF8  7175 14D3 6485 A99C EB6D
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
Url : http://lists.katipo.co.nz/pipermail/koha/attachments/20110228/4bc04a4b/attachment.pgp 


More information about the Koha mailing list