One of my colleagues was asking if I could give statistics on what we have from the 17th, 18th and first half of the 19th century. Preferably by year. So I guess I'm asking for a count by publication year. Any thoughts on the best way to approach this? -- Elaine Bradtke Data Wrangler VWML English Folk Dance and Song Society | http://www.efdss.org Cecil Sharp House, 2 Regent's Park Road, London NW1 7AY Tel +44 (0) 20 7485 2206 (This number is for the English Folk Dance and Song Society in London, England. If you wish to phone me personally, send an e-mail first. I work off site) -------------------------------------------------------------------------- Registered Company No. 297142 Charity Registered in England and Wales No. 305999 --------------------------------------------------------------------------- "Writing about music is like dancing about architecture" --Elvis Costello (Musician magazine No. 60 (October 1983), p. 52)
Elaine Bradtke schreef op di 22-07-2014 om 17:35 [+0100]:
One of my colleagues was asking if I could give statistics on what we have from the 17th, 18th and first half of the 19th century. Preferably by year.
When my Elasticsearch work is finished, this'll be just a matter of finding an elasticsearch visualisation tool that lets you explore your data, and telling it to do what you want. The way I'd do it currently would be to script something that pulls the date from the MARC of each entry and exports that. Then you could pop it into libreoffice calc and make graphs or whatever you need. -- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF
This is the basic report: SELECT count(biblio.copyrightdate), biblio.copyrightdate,items.itype FROM items LEFT JOIN biblioitems on (items.biblioitemnumber=biblioitems.biblioitemnumber) LEFT JOIN biblio on (biblioitems.biblionumber=biblio.biblionumber) GROUP BY biblio.copyrightdate ORDER BY biblio.copyrightdate asc I also used an expanded version with more fields so I could see what the numbers really mean. - It doesn't pick up everything. A lot of our 18th and 19th century publications have partial or conjectural dates in square brackets. These seem to be ignored. - It gives a count of the items. Not the biblios. So this skews the count further (we have a lot of 18th century pamphlets bound together - many biblios, but only one item record for the bound volume). Is there any way we can be more precise? Would a query on the MARCXML for the 260 c produce a more accurate result? Can we to force it to count biblios instead of items? It may be we have to settle for doing the counting in the spreadsheet rather than via Koha. But it would be nice to have some control over what is counted. I know next to nothing about SQL, any help would be greatly appreciated. Thanks On Wed, Jul 23, 2014 at 12:31 AM, Robin Sheat <robin@catalyst.net.nz> wrote:
Elaine Bradtke schreef op di 22-07-2014 om 17:35 [+0100]:
One of my colleagues was asking if I could give statistics on what we have from the 17th, 18th and first half of the 19th century. Preferably by year.
When my Elasticsearch work is finished, this'll be just a matter of finding an elasticsearch visualisation tool that lets you explore your data, and telling it to do what you want.
The way I'd do it currently would be to script something that pulls the date from the MARC of each entry and exports that. Then you could pop it into libreoffice calc and make graphs or whatever you need.
-- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Elaine Bradtke Data Wrangler VWML English Folk Dance and Song Society | http://www.efdss.org Cecil Sharp House, 2 Regent's Park Road, London NW1 7AY Tel +44 (0) 20 7485 2206 (This number is for the English Folk Dance and Song Society in London, England. If you wish to phone me personally, send an e-mail first. I work off site) -------------------------------------------------------------------------- Registered Company No. 297142 Charity Registered in England and Wales No. 305999 --------------------------------------------------------------------------- "Writing about music is like dancing about architecture" --Elvis Costello (Musician magazine No. 60 (October 1983), p. 52)
Elaine: I think you should use 008 date: SELECT CASE SUBSTR(ExtractValue(m.marcxml,'//controlfield[@tag="008"]'),8,2) WHEN '19' THEN 'XX' WHEN '20' THEN 'XXI' WHEN '18' THEN 'XVIII' WHEN '17' THEN 'XVII' ElSE 'OLDER' END AS bibtype, count(DISTINCT biblionumber) AS bibs FROM biblioitems m GROUP BY bibtype I made this report based on two reports from the library by Nicole C. Engard. Ramiro 2014-07-23 15:57 GMT-03:00 Elaine Bradtke <eb@efdss.org>:
This is the basic report: SELECT count(biblio.copyrightdate), biblio.copyrightdate,items.itype FROM items LEFT JOIN biblioitems on (items.biblioitemnumber=biblioitems.biblioitemnumber) LEFT JOIN biblio on (biblioitems.biblionumber=biblio.biblionumber) GROUP BY biblio.copyrightdate ORDER BY biblio.copyrightdate asc
I also used an expanded version with more fields so I could see what the numbers really mean.
- It doesn't pick up everything. A lot of our 18th and 19th century publications have partial or conjectural dates in square brackets. These seem to be ignored. - It gives a count of the items. Not the biblios. So this skews the count further (we have a lot of 18th century pamphlets bound together - many biblios, but only one item record for the bound volume).
Is there any way we can be more precise? Would a query on the MARCXML for the 260 c produce a more accurate result? Can we to force it to count biblios instead of items? It may be we have to settle for doing the counting in the spreadsheet rather than via Koha. But it would be nice to have some control over what is counted.
I know next to nothing about SQL, any help would be greatly appreciated.
Thanks
On Wed, Jul 23, 2014 at 12:31 AM, Robin Sheat <robin@catalyst.net.nz> wrote:
Elaine Bradtke schreef op di 22-07-2014 om 17:35 [+0100]:
One of my colleagues was asking if I could give statistics on what we have from the 17th, 18th and first half of the 19th century. Preferably by year.
When my Elasticsearch work is finished, this'll be just a matter of finding an elasticsearch visualisation tool that lets you explore your data, and telling it to do what you want.
The way I'd do it currently would be to script something that pulls the date from the MARC of each entry and exports that. Then you could pop it into libreoffice calc and make graphs or whatever you need.
-- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Elaine Bradtke Data Wrangler VWML English Folk Dance and Song Society | http://www.efdss.org Cecil Sharp House, 2 Regent's Park Road, London NW1 7AY Tel +44 (0) 20 7485 2206 (This number is for the English Folk Dance and Song Society in London, England. If you wish to phone me personally, send an e-mail first. I work off site) -------------------------------------------------------------------------- Registered Company No. 297142 Charity Registered in England and Wales No. 305999 --------------------------------------------------------------------------- "Writing about music is like dancing about architecture" --Elvis Costello (Musician magazine No. 60 (October 1983), p. 52) _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
Ramiro That's very useful, thanks. I've expanded it a little to accommodate our date range (yes we have some old items) and changed it to say "Unknown" rather than older, because I suspect these are questionable dates or missing data. Quite a few dates are missing in the 008 field. One of the difficulties in producing reliable statistics! SELECT CASE SUBSTR(ExtractValue(m.marcxml,'//controlfield[@tag="008"]'),8,2) WHEN '19' THEN 'XX' WHEN '20' THEN 'XXI' WHEN '18' THEN 'XIX' WHEN '17' THEN 'XVIII' WHEN '16' THEN 'XVII' WHEN '15' THEN 'XVI' ElSE 'Unknown' END AS bibtype, count(DISTINCT biblionumber) AS bibs FROM biblioitems m GROUP BY bibtype On Wed, Jul 23, 2014 at 8:35 PM, ramirouvia . <ramirouvia@gmail.com> wrote:
Elaine:
I think you should use 008 date:
SELECT CASE SUBSTR(ExtractValue(m.marcxml,'//controlfield[@tag="008"]'),8,2) WHEN '19' THEN 'XX' WHEN '20' THEN 'XXI' WHEN '18' THEN 'XVIII' WHEN '17' THEN 'XVII' ElSE 'OLDER' END AS bibtype, count(DISTINCT biblionumber) AS bibs FROM biblioitems m GROUP BY bibtype
I made this report based on two reports from the library by Nicole C. Engard.
Ramiro
2014-07-23 15:57 GMT-03:00 Elaine Bradtke <eb@efdss.org>:
This is the basic report: SELECT count(biblio.copyrightdate), biblio.copyrightdate,items.itype FROM items LEFT JOIN biblioitems on (items.biblioitemnumber=biblioitems.biblioitemnumber) LEFT JOIN biblio on (biblioitems.biblionumber=biblio.biblionumber) GROUP BY biblio.copyrightdate ORDER BY biblio.copyrightdate asc
I also used an expanded version with more fields so I could see what the numbers really mean.
- It doesn't pick up everything. A lot of our 18th and 19th century
publications have partial or conjectural dates in square brackets. These seem to be ignored. - It gives a count of the items. Not the biblios. So this skews the
count further (we have a lot of 18th century pamphlets bound together - many biblios, but only one item record for the bound volume).
Is there any way we can be more precise? Would a query on the MARCXML for the 260 c produce a more accurate result? Can we to force it to count biblios instead of items? It may be we have to settle for doing the counting in the spreadsheet rather than via Koha. But it would be nice to have some control over what is counted.
I know next to nothing about SQL, any help would be greatly appreciated.
Thanks
On Wed, Jul 23, 2014 at 12:31 AM, Robin Sheat <robin@catalyst.net.nz> wrote:
Elaine Bradtke schreef op di 22-07-2014 om 17:35 [+0100]:
One of my colleagues was asking if I could give statistics on what we have from the 17th, 18th and first half of the 19th century. Preferably by year.
When my Elasticsearch work is finished, this'll be just a matter of finding an elasticsearch visualisation tool that lets you explore your data, and telling it to do what you want.
The way I'd do it currently would be to script something that pulls the date from the MARC of each entry and exports that. Then you could pop it into libreoffice calc and make graphs or whatever you need.
-- Robin Sheat Catalyst IT Ltd. ✆ +64 4 803 2204 GPG: 5FA7 4B49 1E4D CAA4 4C38 8505 77F5 B724 F871 3BDF
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Elaine Bradtke Data Wrangler VWML English Folk Dance and Song Society | http://www.efdss.org Cecil Sharp House, 2 Regent's Park Road, London NW1 7AY Tel +44 (0) 20 7485 2206 (This number is for the English Folk Dance and Song Society in London, England. If you wish to phone me personally, send an e-mail first. I work off site) -------------------------------------------------------------------------- Registered Company No. 297142 Charity Registered in England and Wales No. 305999
--------------------------------------------------------------------------- "Writing about music is like dancing about architecture" --Elvis Costello (Musician magazine No. 60 (October 1983), p. 52) _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Elaine Bradtke Data Wrangler VWML English Folk Dance and Song Society | http://www.efdss.org Cecil Sharp House, 2 Regent's Park Road, London NW1 7AY Tel +44 (0) 20 7485 2206 (This number is for the English Folk Dance and Song Society in London, England. If you wish to phone me personally, send an e-mail first. I work off site) -------------------------------------------------------------------------- Registered Company No. 297142 Charity Registered in England and Wales No. 305999 --------------------------------------------------------------------------- "Writing about music is like dancing about architecture" --Elvis Costello (Musician magazine No. 60 (October 1983), p. 52)
Salvete!
SELECT CASE SUBSTR(ExtractValue(m.marcxml,'//controlfield[@tag="008"]'),8,2) WHEN '19' THEN 'XX' WHEN '20' THEN 'XXI' WHEN '18' THEN 'XVIII' WHEN '17' THEN 'XVII' ElSE 'OLDER' END AS bibtype, count(DISTINCT biblionumber) AS bibs FROM biblioitems m GROUP BY bibtype
I made this report based on two reports from the library by Nicole C. Engard.
Should prolly be
WHEN '19' THEN 'X1X' WHEN '20' THEN 'XX' WHEN '18' THEN 'XVIII' WHEN '17' THEN 'XVII'
yeah? Cheers, Brooke
participants (4)
-
BWS Johnson -
Elaine Bradtke -
ramirouvia . -
Robin Sheat