John, if your file is in the standard iso2709 MARC format, you can use bulkmarcimport.pl found in the intranet/scripts/misc directory of the MARC version (currently 1.9.x) of Koha. Stephen Hedges Nelsonville Public Library
Howdy!
I've developed a couple of programs that I have used to aid me in populating the library database.
The first program takes a raw scan of a book's EAN (ie:9781575212784) and converts it into the ISBN (ie:1575212781).
The second program takes a list of ISBN's and attempts to retrieve the MARC record for each title from the LoC.
I have successfully used these two programs to obtain MARC records on approximately 4000 titles so far. I think it took about 40 minutes to poll the LoC and save all the responses to a file.
The only program I need to find/write/beg/borrow/steal is one that will take all these MARC records and import them into KOHA. Is there anything out there that does this already?
Regards, John
PS: I can make the above mentioned two programs available if there's enough interest.
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
Stephen Hedges wrote:
John, if your file is in the standard iso2709 MARC format, you can use bulkmarcimport.pl found in the intranet/scripts/misc directory of the MARC version (currently 1.9.x) of Koha.
No, wrong suggestion. If you use bulkmarcimport, you include items from LoC. and if there are no items, you include item-less biblios into Koha. That is a very bad idea imho, as borrowers could be happy to find a book, then unhappy when they see that the book is not really here. The good solution is to enter the books in the "breeding farm". It's a place where biblios are stored, waiting for inclusion into "active DB". When you catalogue, if you choose "New biblio" (with MARC=on in parameters), you can enter a title or an ISBN to search into the breeding farm before creating the biblio. Next screen is the list of biblios you have in your active DB (in case you forgot you already had it), and the list of correspounding biblios in your breeding farm. If you find what you need, click, and next screen is Biblio Editor, filled with biblio imported from breeding farm. Here you can complete/modify the biblio, then create items. That's the right way to use biblios from a z3950 site imho. That how I did for Dombe Abbey with 40 000 biblios from BNC (Canada) -- Paul POULAIN Consultant indépendant en logiciels libres responsable francophone de koha (SIGB libre http://www.koha-fr.org)
paul POULAIN wrote:
The good solution is to enter the books in the "breeding farm". It's a place where biblios are stored, waiting for inclusion into "active DB". When you catalogue, if you choose "New biblio" (with MARC=on in parameters), you can enter a title or an ISBN to search into the breeding farm before creating the biblio. Next screen is the list of biblios you have in your active DB (in case you forgot you already had it), and the list of correspounding biblios in your breeding farm. If you find what you need, click, and next screen is Biblio Editor, filled with biblio imported from breeding farm. Here you can complete/modify the biblio, then create items.
That's the right way to use biblios from a z3950 site imho. That how I did for Dombe Abbey with 40 000 biblios from BNC (Canada)
dear paul ! i am lurking around for some half of a year to get a feeling what solution to use for e.g. a school biblio /w 16000+ items. choices are ils (obviously dead),obiblio,koha and of course some local commercial beasts i hope to avoid due to their "closed shop"-philosophy :-) metadata quality of items/books is in every candidate-library extremely bad and all have not the resources available to do the COMPLETE work interactively within resonable time, since they cannot close down for months. next, some major reorganization in topology/subjects/etc.etc are desired too, when doing such changes. SO it is essential to prepare data outside of e.g. koha and import it or at least a *big* part of it AUTOMATICALLY (still have no clue how this works or could be done in koha). and keep also in mind the psychological benefits of doing such : people are more happy to key in "just" half of mass than all of it. to prepare fuzzy data of course perl is imho the first choice and z3950 queries are certainly easier to handle than reprocessing html-output of various opac's hanging around. i appreciate your and your colleague's work very much and i could contribute heavily as perl hacker, but it is VERY hard to do just reverse engineering because no architectural or other docs helpful to analyze the beast are available besides bare code. also cvs seems not to be the latest *common* effort, or am i not right in this case ? --- cu wolfgang
Wolfgang Pichler <wolfgang.pichler@ivv.tuwien.ac.at> wrote:
i appreciate your and your colleague's work very much and i could contribute heavily as perl hacker, but it is VERY hard to do just reverse engineering because no architectural or other docs helpful to analyze the beast are available besides bare code. also cvs seems not to be the latest *common* effort, or am i not right in this case ?
Can you spell out exactly what documents you need? Maybe then someone who knows the relevant part can start to contribute them. Did someone post a link to a cross-referenced source of koha, or did I imagine that? -- MJR/slef My Opinion Only and possibly not of any group I know. http://mjr.towers.org.uk/ jabber://slef@jabber.at Creative copyleft computing services via http://www.ttllp.co.uk/ Thought: Edwin A Abbott wrote about trouble with Windows in 1884
MJ Ray wrote:
Wolfgang Pichler <wolfgang.pichler@ivv.tuwien.ac.at> wrote:
i appreciate your and your colleague's work very much and i could contribute heavily as perl hacker, but it is VERY hard to do just reverse engineering because no architectural or other docs helpful to analyze the beast are available besides bare code. also cvs seems not to be the latest *common* effort, or am i not right in this case ?
Can you spell out exactly what documents you need? Maybe then someone who knows the relevant part can start to contribute them.
field of work could/would be : consistent import of existing data (the whole old working library -> working koha library - not just biblio breeding :-) found a visualized RD-scheme at http://irref.mine.nu/user/dchud/koha-scheme/ as a starter, but it seems somehow useless without a list of values used for indicator/status-fields or a list of constraints checked obviously programmatically in various places, or more bad, never checked and evocating hard-to-find delayed bugs due to perl's smatrness of operating everything somehow. as stated before a definitive list of calculated (when,how,... nasty questions, i admit,... :-) vs. untouched "bare" legacy data would be great. i ever some "design" was applied, this could be no problem, but koha seems to be some grown beast with different code-quality and maybe no one remembers the old assumptions any more :-) so i do not expect some kind of petri-net, but if koha is a multi-developer-effort, at least some negotiated interfaces could be documented in some descriptive emails ... irc seems a quite good channel too, but only if massive experience on the "skeleton" is present, or it will be waste of time for the others.
Did someone post a link to a cross-referenced source of koha, or did I imagine that?
SO WHERE IS AT LEAST THE LATEST CVS ? sf ? looked at dates : seems, everyone keeps it's own copy and check-ins are rather rare.
On 2003-08-15 12:49:46 +0100 Wolfgang Pichler <wolfgang.pichler@ivv.tuwien.ac.at> wrote:
Can you spell out exactly what documents you need? Maybe then someone who knows the relevant part can start to contribute them. field of work could/would be : consistent import of existing data (the whole old working library -> working koha library - not just biblio breeding :-)
Existing data from what?
found a visualized RD-scheme at http://irref.mine.nu/user/dchud/koha-scheme/
That URL doesn't answer me just now, so I can't comment. [...]
as stated before a definitive list of calculated (when,how,... nasty questions, i admit,... :-) vs. untouched "bare" legacy data would be great. i ever some "design" was applied, this could be no problem, but koha seems to be some grown beast with different code-quality and maybe no one remembers the old assumptions any more :-)
OK, I asked you to restate it because I didn't understand the first time. I still don't understand. Can you rephrase/expand? Koha is forged in the field, so sometimes there are rough edges. This is no different to most software, but they aren't concealed in Koha, so we need to work on smoothing them off. I only did a few software engineering courses at uni and I know there are things in Koha that aren't what they should be. But we must prioritise. Right now, a mostly working and complete 2.0 is needed. We've been in feature freeze too long already.
so i do not expect some kind of petri-net, but if koha is a multi-developer-effort, at least some negotiated interfaces could be documented in some descriptive emails ...
See the koha-devel list archives at sourceforget and see if it has what you need. [...]
Did someone post a link to a cross-referenced source of koha, or did I imagine that? SO WHERE IS AT LEAST THE LATEST CVS ? sf ? looked at dates : seems, everyone keeps it's own copy and check-ins are rather rare.
YE GODS MAN, WHY ARE YOU SHOUTING AT US? Seriously: English probably isn't the easiest language for this, but please be patient with us. sf is latest CVS. Maybe some developers are sitting on check-ins, maybe August holidays are slowing things up. If it doesn't move fast enough for you, help. If you send a patch to koha-devel and no-one picks it up, I'll put my QA hat on and help it along. -- MJR/slef My Opinion Only and possibly not of any group I know. http://mjr.towers.org.uk/ jabber://slef@jabber.at Creative copyleft computing services via http://www.ttllp.co.uk/ Thought: Edwin A Abbott wrote about trouble with Windows in 1884
MJ Ray wrote:
On 2003-08-15 12:49:46 +0100 Wolfgang Pichler <wolfgang.pichler@ivv.tuwien.ac.at> wrote: Existing data from what?
source : ms-access data (id,author, title, maybe isbn and many other fields, all heavily postprocessed by exporting cvs and running heavy perl-scripts to check if isbn is valid, to try to figure out multiple authors, since they are given with various delimiters [ John & Jack Doe / Doe, John and Jack / Doe, John ; Smith, J. / and all that funny things an unconstrained user may do :-(((, -- size about 16000 items !) overall they crunched their audio-tapes and multilingual items into such a system, moreover there are borrower's data and currently-issued-states to monitor --- l o t s of funny things to do /w perl. they wish to reorganize the current overall shelving and collection systematics (not sure where to find it in koha : biblioitems:classification ?) all this has to be mapped into the koha-sql-db. hope that answers the question :-)
OK, I asked you to restate it because I didn't understand the first time. I still don't understand. Can you rephrase/expand?
there are many tinyint's and char(1-4) in the db-scheme, which are obviously sometimes booleans, or other indicators for some status (biblio:serial == boolean : is a serial?, borrowers:categroycode, reserves:found...). these are not arbitrary fields, but their content has significance in processing the record they are in. there are lots of them and they serve as a kind of interfacing item for processing code. any summaries about these ? or just RTFC ?
See the koha-devel list archives at sourceforget and see if it has what you need.
this was/is my first source of information retrieval and i am not done completely :-)
YE GODS MAN, WHY ARE YOU SHOUTING AT US? Seriously: English probably isn't the easiest language for this, but please be patient with us.
sorry, capitals in austrian literature are just an emphasis, no crying, but time will come after 25 years work in IT i will shut this reflex down :-)
sf is latest CVS.
ok. it will be my bible. cu wolfgang
Wolfgang Pichler <wolfgang.pichler@ivv.tuwien.ac.at> wrote:
Existing data from what?
MJ Ray wrote: source : ms-access data (id,author, title, maybe isbn and many other fields, all heavily postprocessed by exporting cvs and running heavy
You store access data in CVS?
perl-scripts to check if isbn is valid, to try to figure out multiple authors, since they are given with various delimiters [ John & Jack Doe / Doe, John and Jack / Doe, John ; Smith, J. / and all that funny things an unconstrained user may do :-(((, -- size about 16000 items !)
OK. You could output MARC records, would be my best guess. Maybe someone here can point at helpful things about MARC.
overall they crunched their audio-tapes and multilingual items into such a system, moreover there are borrower's data and currently-issued-states to monitor --- l o t s of funny things to do /w perl.
I'm not 100% clear on this just now and I doubt it will become clearer until you have a testbed running. Maybe other library users can explain how they cope with current issues while moving to koha? Just reenter, run the two systems alongside, or something smarter?
they wish to reorganize the current overall shelving and collection systematics (not sure where to find it in koha : biblioitems:classification ?)
Again, this is probably something other library users on this list can advise on, but I think that's where you should be looking.
OK, I asked you to restate it because I didn't understand the first time. I still don't understand. Can you rephrase/expand? there are many tinyint's and char(1-4) in the db-scheme, which are obviously sometimes booleans, or other indicators for some status (biblio:serial == boolean : is a serial?, borrowers:categroycode, reserves:found...).
Yes, I admit that I'm not sure why they are those types instead of enumerated or foreign keys in another table, unless it's because we're *still* working without foreign keys in MySQL :-(
any summaries about these ? or just RTFC ?
A better question for -devel. If you can post a complete list of these unexplained ones, developers of the relevant parts will explain it eventually, I guess. -- MJR/slef My Opinion Only and possibly not of any group I know. http://mjr.towers.org.uk/ jabber://slef@jabber.at Creative copyleft computing services via http://www.ttllp.co.uk/ Thought: Edwin A Abbott wrote about trouble with Windows in 1884
MJ Ray wrote:
Wolfgang Pichler <wolfgang.pichler@ivv.tuwien.ac.at> wrote:
MJ Ray wrote:
Existing data from what?
source : ms-access data (id,author, title, maybe isbn and many other fields, all heavily postprocessed by exporting cvs and running heavy
You store access data in CVS?
a typo : data exported as csv ...
perl-scripts to check if isbn is valid, to try to figure out multiple authors, since they are given with various delimiters [ John & Jack Doe / Doe, John and Jack / Doe, John ; Smith, J. / and all that funny things an unconstrained user may do :-(((, -- size about 16000 items !)
OK. You could output MARC records, would be my best guess. Maybe someone here can point at helpful things about MARC.
i am experimenting /w marc, esp. Net::Z3950 and have good pointers about it, though in this complex subject one will never stop learning :-) i am missing some MAB2 -> MARC21 conversion tool and good MAB2 documentation, since this all is related to g e r m a n literature where the relevant austrian Z3950 servers will often deliver just MAB record syntax, and no MARC. also if someone has a sophisticated html-postprocessor, which can do good reformatting from html-opac-queries (which are free vs. z3950 !), it would be a great help. of course i intend to design my scripts for reuse and make them available too.
I'm not 100% clear on this just now and I doubt it will become clearer until you have a testbed running. Maybe other library users can explain how they cope with current issues while moving to koha? Just reenter, run the two systems alongside, or something smarter?
there must be a clear switch, the old system is too stupid to be run in parallel. of course there will be a testing/target environment and i will try to migrate most of the old data in incremental batch runs followed by interactive "smoothing", but these will be customized Perl/Tk's tailored to the specific actions necessary and yielding sets of importable data which will be fed in one go when switching will happen (formerly tested, of course) since it is acceptable to inhibit deletions for a while there should be no problem with ghost items or dup's. remember, since i will have to build/keep "shadow"-data of the current system and feed it in one go, everything should work just fine, if koha just works fine. the only problem is : what to feed where in mysql :-)
they wish to reorganize the current overall shelving and collection systematics (not sure where to find it in koha : biblioitems:classification ?)
Again, this is probably something other library users on this list can advise on, but I think that's where you should be looking.
there are many tinyint's and char(1-4) in the db-scheme, which are obviously sometimes booleans, or other indicators for some status (biblio:serial == boolean : is a serial?, borrowers:categroycode, reserves:found...).
Yes, I admit that I'm not sure why they are those types instead of enumerated or foreign keys in another table, unless it's because we're *still* working without foreign keys in MySQL :-(
any summaries about these ? or just RTFC ?
A better question for -devel. If you can post a complete list of these unexplained ones, developers of the relevant parts will explain it eventually, I guess.
i hope so. but i fear, that a lot of reverse engineering by me will be necessary. since everyone who will have to migrate data will have to do so currently, lacking overall docs ARE a major factor to inhibit using koha. cu wolfgang
Wolfgang Pichler <wolfgang.pichler@ivv.tuwien.ac.at> wrote:
how they cope with current issues while moving to koha? Just reenter, run the two systems alongside, or something smarter? there must be a clear switch, the old system is too stupid to be run in parallel.
Is running the old system for returns, fines etc until there are few enough items out that manual transfer is possible. Then again, I'm assuming that it's the status parts that are the tricky bits to transfer. I've not done it, so I could be wrong. [...]
since everyone who will have to migrate data will have to do so currently, lacking overall docs ARE a major factor to inhibit using koha.
Yes, but lack of a full release containing all the latest really good work is a bigger one, IMO. -- MJR/slef My Opinion Only and possibly not of any group I know. Creative copyleft computing services via http://www.ttllp.co.uk/
On Fri, 15 Aug 2003, Wolfgang Pichler wrote:
found a visualized RD-scheme at http://irref.mine.nu/user/dchud/koha-scheme/ as a starter, but it seems somehow useless without a list of values used for indicator/status-fields or a list of constraints checked obviously programmatically in various places, or more bad, never checked and evocating hard-to-find delayed bugs due to perl's smatrness of operating everything somehow.
Can somebody put that URL somewhere that it's actually accessible? And I seriously doubt that a quest to have Koha work with full referential integrity checked by the databse wouldn't be a worthwhile endeavor unless we're seeing data corruption heisenbugs that need to be tracked down. And honestly those cases have always been handled by looking at the query log and seeing what it's doing much more effectively than "the script crashed because some dippy purist thought referential integrity was as important as some college professor told them and that college professor hasn't written a program that's actually useful to a real human doing real work in thirty years." Feel free to disagree, but if you don't have a signficant amount of nonacademic experience with doing development projects semi-informally then you have no relevant context to what I've seen happening with Koha. That doesn't make Koha bad. That doesn't make you bad. It just means you're going to have to get used to things as they've been done or see if anybody will let you fix them your way.
as stated before a definitive list of calculated (when,how,... nasty questions, i admit,... :-) vs. untouched "bare" legacy data would be great. i ever some "design" was applied, this could be no problem, but koha seems to be some grown beast with different code-quality and maybe no one remembers the old assumptions any more :-)
If you'd like to write that sort of documentation I'm sure someone will provide some web space for it. Honestly, "design after" isn't a bad thing for people that have some clue about implementing practical databses. And since you can obviously browse cvs and read perl you can find all of those code-quality misassumptions and submit patches. Right?
so i do not expect some kind of petri-net, but if koha is a multi-developer-effort, at least some negotiated interfaces could be documented in some descriptive emails ...
Given how few people are actually coding and they seem to be making real progress, how much negotiation do you expect is still required?
Did someone post a link to a cross-referenced source of koha, or did I imagine that?
SO WHERE IS AT LEAST THE LATEST CVS ? sf ? looked at dates : seems, everyone keeps it's own copy and check-ins are rather rare.
That's a pretty normal developer pattern. Push things along in personal cvs until it's worth worrying about merging. It is generally better to put things out as often as possible and people may be pushing cvs commits after every day they've worked - we just don't know. -- </chris> The death of democracy is not likely to be an assassination from ambush. It will be a slow extinction from apathy, indifference, and undernourishment. -Robert Maynard Hutchins, educator (1899-1977)
Christopher Hicks wrote:
On Fri, 15 Aug 2003, Wolfgang Pichler wrote:
found a visualized RD-scheme at http://irref.mine.nu/user/dchud/koha-scheme/ as a starter, but it seems somehow useless without a list of values used for indicator/status-fields or a list of constraints checked obviously programmatically in various places, or more bad, never checked and evocating hard-to-find delayed bugs due to perl's smatrness of operating everything somehow.
Can somebody put that URL somewhere that it's actually accessible?
sorry, but i had luck accessing it, seems to be some dyndns or mostly offline site, maybe. seems to be some old scheme compared with 2.0.0pre2 *.sql, but is a starter ...
And I seriously doubt that a quest to have Koha work with full referential integrity checked by the databse wouldn't be a worthwhile endeavor unless
[snip advocacy /wo prior offence] did you hear me voting for ref.integrity ? nope. so where is your problem ? i have teh problem if i want to do something with koha and lacking docs. IF migrating to koha, which seems to be supported, i hope, I HAVE to populate the db CORRECTLY. IF you can explain me how to do this without actually being able to understand koha most completely, because i have to jump deeply into the code, PLEASE do so. IF i have a chance to achieve a consistent data content in koha without keying in every biblio, item, borrower, category, etc.etc. in front of the gui from scratch, PLEASE explain. (nota bene: NOT just breeding infos imported from marc-records)
If you'd like to write that sort of documentation I'm sure someone will provide some web space for it. Honestly, "design after" isn't a bad thing for people that have some clue about implementing practical databses. And since you can obviously browse cvs and read perl you can find all of those code-quality misassumptions and submit patches. Right?
as stated before. please cite any words from me, which are speking of "code-quality misassumptions".. the only thing i stated, is that perl can be very funny with it's wonderful default behaviour. btw. personally i rely on such behaviour, and be honest, who ever has not printed undef's, intentionally or not ? i assume the web-space could already be populated to some degree by the people who know what they are doing. i think you also do NOT feel good to read megs of code for weeks until you get some idea, what is going on behind the scenes ...
so i do not expect some kind of petri-net, but if koha is a multi-developer-effort, at least some negotiated interfaces could be documented in some descriptive emails ...
Given how few people are actually coding and they seem to be making real progress, how much negotiation do you expect is still required?
do not expect ousiders to know project internals about number of devels, etc. obviously fewer people are involved, than i thought, as you tell me indirectly.
SO WHERE IS AT LEAST THE LATEST CVS ? sf ? looked at dates : seems, everyone keeps it's own copy and check-ins are rather rare.
That's a pretty normal developer pattern. Push things along in personal cvs until it's worth worrying about merging. It is generally better to put things out as often as possible and people may be pushing cvs commits after every day they've worked - we just don't know.
i agree heavily, slower moving targets are probably better. i appreciate your advocacy, but you are constructing offence, where none is. every developer including me dislikes documentation, but this gun shoots backwards. using koha means understanding koha for people who must operate it beyond the gui. this is a hard fact, like it or not. cu wolfgang
On Fri, 15 Aug 2003, Wolfgang Pichler wrote:
Christopher Hicks wrote:
On Fri, 15 Aug 2003, Wolfgang Pichler wrote: did you hear me voting for ref.integrity ? nope. so where is your problem ?
Given what you said it was a challenge to determine what you meant. You certainly sounded like a formality and referential integrity wonk. If that's not you, great, but maybe you should consider that I'm not the only person who seems to have "gotten the wrong idea" about where you were coming from.
i have teh problem if i want to do something with koha and lacking docs.
I sympathise, but at this point it's a dig in and ask questions sort of thing. Demanding docs to folks trying to push a release out the door is not going to get you many answers.
IF migrating to koha, which seems to be supported, i hope, I HAVE to populate the db CORRECTLY.
Support and good docs are two very seperate things. Also consider that others haven't needed what you're looking for to populate the DB, but I'm sure that was all hashed out much earlier in this thread.
IF you can explain me how to do this without actually being able to understand koha most completely, because i have to jump deeply into the code, PLEASE do so.
Doing enough with the front end to see how the database looks afterward and then asking questions that come up through that process doesn't seem to be arduous process for someone with your technical skills.
IF i have a chance to achieve a consistent data content in koha without keying in every biblio, item, borrower, category, etc.etc. in front of the gui from scratch, PLEASE explain. (nota bene: NOT just breeding infos imported from marc-records)
I wish I knew. I've played with Koha a little bit, and hung around answering the questions I can, but you're trying to go deeper than I ever found the need to go. Someday I hope to go there and I'll be happy to share whatever docs I come up through that process, but honestly there's a lot of other development work in front of me getting back to Koha.
i assume the web-space could already be populated to some degree by the people who know what they are doing.
It could be, but there isn't much of what you're looking for since most users don't care and most developers either figure it out themselves or don't care either.
i think you also do NOT feel good to read megs of code for weeks until you get some idea, what is going on behind the scenes ...
If you've got to read every line of code to get an idea of what's going on I'm sorry. Install it. Play with it. Browse the database. How much code did I read in the process? Zilch. How much would I need to read to figure out the stuff you're talking about? A few hours worth. Nobody's asking you to run a marathon here, but whining when somebody says the milk isn't in the fridge, here's two bucks, walk to the store down the block and get it isn't going to get you very far.
i appreciate your advocacy, but you are constructing offence, where none is.
I'm honestly not the only person who took you as coming from that perspective. Please pardon any inappropriate grumpiness. It's a BOFH thing. :)
every developer including me dislikes documentation, but this gun shoots backwards. using koha means understanding koha for people who must operate it beyond the gui. this is a hard fact, like it or not.
Noone disagrees that what you're looking for would be a good thing, but until somebody volunteers to do it and actually does it won't happen. Another hard fact. Since you seem to be quite disturbed by this state of affairs hoepfully you will be that volunteer. -- </chris> The death of democracy is not likely to be an assassination from ambush. It will be a slow extinction from apathy, indifference, and undernourishment. -Robert Maynard Hutchins, educator (1899-1977)
participants (5)
-
Christopher Hicks -
MJ Ray -
paul POULAIN -
Stephen Hedges -
Wolfgang Pichler