I'm trying to export batches of records from MARC to a delimited text format, do a bunch of editing and then convert back to MARC. MarcEdit works great, except when going from MARC to CSV, the repeating fields (5XX and up) are being concatenated with semicolon separators into single fields. I'd like to keep these as separate repeating fields, so I'm working on kludging up some workarounds, but in the meantime can anyone suggest an alternative tool, ideally one that will do a clean round-trip, i.e. reconstruct the MARC from the CSV identically if the data isn't altered? Chris I'm cc'ing you specifically because I recall your mentioning a script in a past thread that went the other way (CSV to MARC), but unfortunately couldn't find the message in the archives.
hansbkk@gmail.com schreef op ma 28-02-2011 om 12:15 [+0700]:
I'm trying to export batches of records from MARC to a delimited text format, do a bunch of editing and then convert back to MARC.
The CSV format is not capable of holding the information contained within MARC. MARC has two levels of repeatable headers (field, subfield) where order and grouping matters whereas CSV has one level of (generally) non-repeatable headers with fixed order and no concept of grouping.
MarcEdit works great, except when going from MARC to CSV, the repeating fields (5XX and up) are being concatenated with semicolon separators into single fields. I'd like to keep these as separate repeating fields,
Splitting with tokens, like semicolons, is pretty much the only way to begin, but unless you're extremely careful you're still going to lose information. The only way to be careful is by encoding a whole lot of extra information into your CSV files, at which point they stop being CSV, for all intents and purposes.
so I'm working on kludging up some workarounds, but in the meantime can anyone suggest an alternative tool, ideally one that will do a clean round-trip, i.e. reconstruct the MARC from the CSV identically if the data isn't altered?
That is pretty much an impossible task.
Chris I'm cc'ing you specifically because I recall your mentioning a script in a past thread that went the other way (CSV to MARC), but unfortunately couldn't find the message in the archives.
I have a script I call csvtomarc.pl. It's designed for taking the output from something that can only export in a CSV-like form[0], passing it through a host of rules and transformations, linking up items, and outputting MARC. It's what I use to do migrations, generally. The most up-to-date version of this at the moment can be found here: http://git.catalyst.net.nz/gw?p=koha.git;a=tree;f=import/csv;h=a92d26d020e04... but I warn you, it's not designed as an end-user tool, it's pretty complex. [0] No product I've encountered yet can export as proper CSV it seems, they all break the spec and produce unparseable results that require hand massaging to fix. I'm especially looking at you, Liberty. -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5957 6D23 8B16 EFAB FEF8 7175 14D3 6485 A99C EB6D
Robin, thanks so much for helping me not waste my time trying to accomplish the impossible. I guess I'll use the spreadsheet as a read-only viewing tool to compare the various MARC records for a given title, but do the actual editing in MarcEdit's mnemonic format (.mrk) On Mon, Feb 28, 2011 at 12:37 PM, Robin Sheat <robin@catalyst.net.nz> wrote:
hansbkk@gmail.com schreef op ma 28-02-2011 om 12:15 [+0700]:
I'm trying to export batches of records from MARC to a delimited text format, do a bunch of editing and then convert back to MARC.
The CSV format is not capable of holding the information contained within MARC. MARC has two levels of repeatable headers (field, subfield) where order and grouping matters whereas CSV has one level of (generally) non-repeatable headers with fixed order and no concept of grouping.
MarcEdit works great, except when going from MARC to CSV, the repeating fields (5XX and up) are being concatenated with semicolon separators into single fields. I'd like to keep these as separate repeating fields,
Splitting with tokens, like semicolons, is pretty much the only way to begin, but unless you're extremely careful you're still going to lose information. The only way to be careful is by encoding a whole lot of extra information into your CSV files, at which point they stop being CSV, for all intents and purposes.
so I'm working on kludging up some workarounds, but in the meantime can anyone suggest an alternative tool, ideally one that will do a clean round-trip, i.e. reconstruct the MARC from the CSV identically if the data isn't altered?
That is pretty much an impossible task.
Chris I'm cc'ing you specifically because I recall your mentioning a script in a past thread that went the other way (CSV to MARC), but unfortunately couldn't find the message in the archives.
I have a script I call csvtomarc.pl. It's designed for taking the output from something that can only export in a CSV-like form[0], passing it through a host of rules and transformations, linking up items, and outputting MARC. It's what I use to do migrations, generally.
The most up-to-date version of this at the moment can be found here: http://git.catalyst.net.nz/gw?p=koha.git;a=tree;f=import/csv;h=a92d26d020e04... but I warn you, it's not designed as an end-user tool, it's pretty complex.
[0] No product I've encountered yet can export as proper CSV it seems, they all break the spec and produce unparseable results that require hand massaging to fix. I'm especially looking at you, Liberty.
Op maandag 28 februari 2011 19:11:49 schreef hansbkk@gmail.com:
I guess I'll use the spreadsheet as a read-only viewing tool to compare the various MARC records for a given title, but do the actual editing in MarcEdit's mnemonic format (.mrk)
That's probably the best bet. I use them all the time for reading the data, but I'm very careful to not save any changes. Also, spreadsheets are prone to damaging the data themselves sometimes (especially if it comes in in a way that's unexpected for them, e.g. with embedded un-escaped quotes, etc.) I will tweak files with vim, but mostly just to make them parsable so my converter can work with them. -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204
participants (2)
-
hansbkk@gmail.com -
Robin Sheat