Hello everybody, I have been contracted by KohaLa to work on some GDPR requirements. The main idea is to "anonymize" patron's data but letting the library access the transactions' statistics. I am going to present you what I am planning to implement, in order to collect ideas and answers. There are the following steps I have in mind: 1. Pseudonymization [1] of patron's data 2. Improve deletion of patron related date (tables statistics, old_reserves, deletedborrowers) 3. Add the ability to remove data that have been pseudonymized I see 2 ways to achieve point 1: * We create 2 tables, 1 for the patrons, 1 for the transactions. - borrowers_anonymized will contain: hash_id, has_cardnumber, branchcode, creation_date, categorycode, bsort1, bsort2, [borrower_attributes] - transaction_anonymized will contain: hash_id, transaction_type, branchcode, itemnumber, holdingbranch, location, itemcallnumber, itemtype, timestamp hash_id will be generated using the borrowernumber and a key (that will be stored on the server, path in koha-conf) Pros: Easier to understand and manipulate as it follows existing structure. We track patron's modifications (this is the most important part) Cons: tech part: new config, a new path have to be created (minor) * We create only 1 table, (nosql-like). It will contain the same data as previously, without the hash_id Pros: No new config. Data are never updated and we have the values when the transactions has been processed. Cons: Data are not updated :) About borrower_attributes, the initial specification asks for 2 attributes defined in a syspref. I think it should be configurable, with a join table (Pro: more flexible, Con: SQL requests more complex) I think we should have the 2 tables and keep a link between the anonymized_patrons and anonymized_transactions tables. What do you think? I am going to start the implementation very soon in order to plan an integration early in the 20.05 dev cycle. Regards, Jonathan [1] https://en.wikipedia.org/wiki/Pseudonymization
Hi Jonathan, I’m volunteer for debate about processes and anon tools and methods. I’m ready to be tester of bugs. Koha is GDPR ready but some points could be improved for easier everyday usage in libraries. Because if something is clear and easy everybody do it without fear and stress. Thank You Michal čt 21. 11. 2019 v 17:14 odesílatel Jonathan Druart < jonathan.druart@bugs.koha-community.org> napsal:
Hello everybody,
I have been contracted by KohaLa to work on some GDPR requirements. The main idea is to "anonymize" patron's data but letting the library access the transactions' statistics.
I am going to present you what I am planning to implement, in order to collect ideas and answers.
There are the following steps I have in mind: 1. Pseudonymization [1] of patron's data 2. Improve deletion of patron related date (tables statistics, old_reserves, deletedborrowers) 3. Add the ability to remove data that have been pseudonymized
I see 2 ways to achieve point 1: * We create 2 tables, 1 for the patrons, 1 for the transactions. - borrowers_anonymized will contain: hash_id, has_cardnumber, branchcode, creation_date, categorycode, bsort1, bsort2, [borrower_attributes] - transaction_anonymized will contain: hash_id, transaction_type, branchcode, itemnumber, holdingbranch, location, itemcallnumber, itemtype, timestamp
hash_id will be generated using the borrowernumber and a key (that will be stored on the server, path in koha-conf)
Pros: Easier to understand and manipulate as it follows existing structure. We track patron's modifications (this is the most important part) Cons: tech part: new config, a new path have to be created (minor)
* We create only 1 table, (nosql-like). It will contain the same data as previously, without the hash_id
Pros: No new config. Data are never updated and we have the values when the transactions has been processed. Cons: Data are not updated :)
About borrower_attributes, the initial specification asks for 2 attributes defined in a syspref. I think it should be configurable, with a join table (Pro: more flexible, Con: SQL requests more complex)
I think we should have the 2 tables and keep a link between the anonymized_patrons and anonymized_transactions tables.
What do you think? I am going to start the implementation very soon in order to plan an integration early in the 20.05 dev cycle.
Regards, Jonathan
[1] https://en.wikipedia.org/wiki/Pseudonymization _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
Thanks for the help Michal, What about the two options I have? Le jeu. 21 nov. 2019 à 17:58, Mike D. <black23@gmail.com> a écrit :
Hi Jonathan, I’m volunteer for debate about processes and anon tools and methods. I’m ready to be tester of bugs. Koha is GDPR ready but some points could be improved for easier everyday usage in libraries. Because if something is clear and easy everybody do it without fear and stress.
Thank You
Michal
čt 21. 11. 2019 v 17:14 odesílatel Jonathan Druart <jonathan.druart@bugs.koha-community.org> napsal:
Hello everybody,
I have been contracted by KohaLa to work on some GDPR requirements. The main idea is to "anonymize" patron's data but letting the library access the transactions' statistics.
I am going to present you what I am planning to implement, in order to collect ideas and answers.
There are the following steps I have in mind: 1. Pseudonymization [1] of patron's data 2. Improve deletion of patron related date (tables statistics, old_reserves, deletedborrowers) 3. Add the ability to remove data that have been pseudonymized
I see 2 ways to achieve point 1: * We create 2 tables, 1 for the patrons, 1 for the transactions. - borrowers_anonymized will contain: hash_id, has_cardnumber, branchcode, creation_date, categorycode, bsort1, bsort2, [borrower_attributes] - transaction_anonymized will contain: hash_id, transaction_type, branchcode, itemnumber, holdingbranch, location, itemcallnumber, itemtype, timestamp
hash_id will be generated using the borrowernumber and a key (that will be stored on the server, path in koha-conf)
Pros: Easier to understand and manipulate as it follows existing structure. We track patron's modifications (this is the most important part) Cons: tech part: new config, a new path have to be created (minor)
* We create only 1 table, (nosql-like). It will contain the same data as previously, without the hash_id
Pros: No new config. Data are never updated and we have the values when the transactions has been processed. Cons: Data are not updated :)
About borrower_attributes, the initial specification asks for 2 attributes defined in a syspref. I think it should be configurable, with a join table (Pro: more flexible, Con: SQL requests more complex)
I think we should have the 2 tables and keep a link between the anonymized_patrons and anonymized_transactions tables.
What do you think? I am going to start the implementation very soon in order to plan an integration early in the 20.05 dev cycle.
Regards, Jonathan
[1] https://en.wikipedia.org/wiki/Pseudonymization _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
Jonathan, First of, wow, thank you SO much for taking on this task! Option 1 looks really good to me. One thing, however: we’re starting to get into demographic analysis, so having a zipcode in the borrowers_anonymized table would be hugely beneficial. We’re generating choropleth maps, and borrower.zipcode provides just the right level of detail for our needs. I don’t know how well that would mesh with the overarching GDPR requirements, however (my guess: not well). Here’s a sample map that we can get right now; however, this requires us to pull card numbers, then extract the borrowers.zipcode, and then report the data out without the card number. Painful. Again, thanks for taking on this task! Aaron -- Aaron Sakovich Internet and Technology Services Manager Huntsville-Madison County Public Library 915 Monroe Street | Huntsville, Alabama 35801 | https://hmcpl.org/
On Nov 21, 2019, at 10:13, Jonathan Druart <jonathan.druart@bugs.koha-community.org> wrote:
Hello everybody,
I have been contracted by KohaLa to work on some GDPR requirements. The main idea is to "anonymize" patron's data but letting the library access the transactions' statistics.
I am going to present you what I am planning to implement, in order to collect ideas and answers.
There are the following steps I have in mind: 1. Pseudonymization [1] of patron's data 2. Improve deletion of patron related date (tables statistics, old_reserves, deletedborrowers) 3. Add the ability to remove data that have been pseudonymized
I see 2 ways to achieve point 1: * We create 2 tables, 1 for the patrons, 1 for the transactions. - borrowers_anonymized will contain: hash_id, has_cardnumber, branchcode, creation_date, categorycode, bsort1, bsort2, [borrower_attributes] - transaction_anonymized will contain: hash_id, transaction_type, branchcode, itemnumber, holdingbranch, location, itemcallnumber, itemtype, timestamp
hash_id will be generated using the borrowernumber and a key (that will be stored on the server, path in koha-conf)
Pros: Easier to understand and manipulate as it follows existing structure. We track patron's modifications (this is the most important part) Cons: tech part: new config, a new path have to be created (minor)
* We create only 1 table, (nosql-like). It will contain the same data as previously, without the hash_id
Pros: No new config. Data are never updated and we have the values when the transactions has been processed. Cons: Data are not updated :)
About borrower_attributes, the initial specification asks for 2 attributes defined in a syspref. I think it should be configurable, with a join table (Pro: more flexible, Con: SQL requests more complex)
I think we should have the 2 tables and keep a link between the anonymized_patrons and anonymized_transactions tables.
What do you think? I am going to start the implementation very soon in order to plan an integration early in the 20.05 dev cycle.
Regards, Jonathan
[1] https://en.wikipedia.org/wiki/Pseudonymization _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
Thanks for the feedback Aaron, I will try and make it flexible, keeping it mind your need :) Le ven. 22 nov. 2019 à 00:47, asakovich@hmcpl.org <asakovich@hmcpl.org> a écrit :
Jonathan,
First of, wow, thank you SO much for taking on this task!
Option 1 looks really good to me. One thing, however: we’re starting to get into demographic analysis, so having a zipcode in the borrowers_anonymized table would be hugely beneficial. We’re generating choropleth maps, and borrower.zipcode provides just the right level of detail for our needs. I don’t know how well that would mesh with the overarching GDPR requirements, however (my guess: not well).
Here’s a sample map that we can get right now; however, this requires us to pull card numbers, then extract the borrowers.zipcode, and then report the data out without the card number. Painful.
Again, thanks for taking on this task! Aaron -- Aaron Sakovich Internet and Technology Services Manager
Huntsville-Madison County Public Library 915 Monroe Street | Huntsville, Alabama 35801 | https://hmcpl.org/
Hello, A bit of fresh news, I have submitted a bunch of patches that is ready to be tested. The main bug report is bug 24151 (Add a pseudonymization process for patrons and transactions) https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=24151 I pushed a remote branch with everything applied in the correct order, on my gitlab repo: https://gitlab.com/joubu/Koha/commits/bug_24151 Cheers, Jonathan Le jeu. 21 nov. 2019 à 17:13, Jonathan Druart <jonathan.druart@bugs.koha-community.org> a écrit :
Hello everybody,
I have been contracted by KohaLa to work on some GDPR requirements. The main idea is to "anonymize" patron's data but letting the library access the transactions' statistics.
I am going to present you what I am planning to implement, in order to collect ideas and answers.
There are the following steps I have in mind: 1. Pseudonymization [1] of patron's data 2. Improve deletion of patron related date (tables statistics, old_reserves, deletedborrowers) 3. Add the ability to remove data that have been pseudonymized
I see 2 ways to achieve point 1: * We create 2 tables, 1 for the patrons, 1 for the transactions. - borrowers_anonymized will contain: hash_id, has_cardnumber, branchcode, creation_date, categorycode, bsort1, bsort2, [borrower_attributes] - transaction_anonymized will contain: hash_id, transaction_type, branchcode, itemnumber, holdingbranch, location, itemcallnumber, itemtype, timestamp
hash_id will be generated using the borrowernumber and a key (that will be stored on the server, path in koha-conf)
Pros: Easier to understand and manipulate as it follows existing structure. We track patron's modifications (this is the most important part) Cons: tech part: new config, a new path have to be created (minor)
* We create only 1 table, (nosql-like). It will contain the same data as previously, without the hash_id
Pros: No new config. Data are never updated and we have the values when the transactions has been processed. Cons: Data are not updated :)
About borrower_attributes, the initial specification asks for 2 attributes defined in a syspref. I think it should be configurable, with a join table (Pro: more flexible, Con: SQL requests more complex)
I think we should have the 2 tables and keep a link between the anonymized_patrons and anonymized_transactions tables.
What do you think? I am going to start the implementation very soon in order to plan an integration early in the 20.05 dev cycle.
Regards, Jonathan
participants (3)
-
asakovich@hmcpl.org -
Jonathan Druart -
Mike D.