hansbkk@gmail.com schreef op ma 28-02-2011 om 12:15 [+0700]:
I'm trying to export batches of records from MARC to a delimited text format, do a bunch of editing and then convert back to MARC.
The CSV format is not capable of holding the information contained within MARC. MARC has two levels of repeatable headers (field, subfield) where order and grouping matters whereas CSV has one level of (generally) non-repeatable headers with fixed order and no concept of grouping.
MarcEdit works great, except when going from MARC to CSV, the repeating fields (5XX and up) are being concatenated with semicolon separators into single fields. I'd like to keep these as separate repeating fields,
Splitting with tokens, like semicolons, is pretty much the only way to begin, but unless you're extremely careful you're still going to lose information. The only way to be careful is by encoding a whole lot of extra information into your CSV files, at which point they stop being CSV, for all intents and purposes.
so I'm working on kludging up some workarounds, but in the meantime can anyone suggest an alternative tool, ideally one that will do a clean round-trip, i.e. reconstruct the MARC from the CSV identically if the data isn't altered?
That is pretty much an impossible task.
Chris I'm cc'ing you specifically because I recall your mentioning a script in a past thread that went the other way (CSV to MARC), but unfortunately couldn't find the message in the archives.
I have a script I call csvtomarc.pl. It's designed for taking the output from something that can only export in a CSV-like form[0], passing it through a host of rules and transformations, linking up items, and outputting MARC. It's what I use to do migrations, generally. The most up-to-date version of this at the moment can be found here: http://git.catalyst.net.nz/gw?p=koha.git;a=tree;f=import/csv;h=a92d26d020e04... but I warn you, it's not designed as an end-user tool, it's pretty complex. [0] No product I've encountered yet can export as proper CSV it seems, they all break the spec and produce unparseable results that require hand massaging to fix. I'm especially looking at you, Liberty. -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5957 6D23 8B16 EFAB FEF8 7175 14D3 6485 A99C EB6D