[Koha] GDPR - Statistics and anonymization

asakovich at hmcpl.org asakovich at hmcpl.org
Fri Nov 22 12:47:02 NZDT 2019


Jonathan,

First of, wow, thank you SO much for taking on this task!

Option 1 looks really good to me. One thing, however: we’re starting to get into demographic analysis, so having a zipcode in the borrowers_anonymized table would be hugely beneficial. We’re generating choropleth maps, and borrower.zipcode provides just the right level of detail for our needs. I don’t know how well that would mesh with the overarching GDPR requirements, however (my guess: not well).

Here’s a sample map that we can get right now; however, this requires us to pull card numbers, then extract the borrowers.zipcode, and then report the data out without the card number. Painful.



Again, thanks for taking on this task!
Aaron
--
Aaron Sakovich
Internet and Technology Services Manager

Huntsville-Madison County Public Library
915 Monroe Street | Huntsville, Alabama 35801 | https://hmcpl.org/




> On Nov 21, 2019, at 10:13, Jonathan Druart <jonathan.druart at bugs.koha-community.org> wrote:
> 
> Hello everybody,
> 
> I have been contracted by KohaLa to work on some GDPR requirements.
> The main idea is to "anonymize" patron's data but letting the library
> access the transactions' statistics.
> 
> I am going to present you what I am planning to implement, in order to
> collect ideas and answers.
> 
> There are the following steps I have in mind:
> 1. Pseudonymization [1] of patron's data
> 2. Improve deletion of patron related date (tables statistics,
> old_reserves, deletedborrowers)
> 3. Add the ability to remove data that have been pseudonymized
> 
> I see 2 ways to achieve point 1:
> * We create 2 tables, 1 for the patrons, 1 for the transactions.
> - borrowers_anonymized will contain: hash_id, has_cardnumber,
> branchcode, creation_date, categorycode, bsort1, bsort2,
> [borrower_attributes]
> - transaction_anonymized will contain: hash_id, transaction_type,
> branchcode, itemnumber, holdingbranch, location, itemcallnumber,
> itemtype, timestamp
> 
> hash_id will be generated using the borrowernumber and a key (that
> will be stored on the server, path in koha-conf)
> 
> Pros: Easier to understand and manipulate as it follows existing structure.
> We track patron's modifications (this is the most important part)
> Cons: tech part: new config, a new path have to be created (minor)
> 
> * We create only 1 table, (nosql-like). It will contain the same data
> as previously, without the hash_id
> 
> Pros: No new config. Data are never updated and we have the values
> when the transactions has been processed.
> Cons: Data are not updated :)
> 
> About borrower_attributes, the initial specification asks for 2
> attributes defined in a syspref. I think it should be configurable,
> with a join table (Pro: more flexible, Con: SQL requests more complex)
> 
> I think we should have the 2 tables and keep a link between the
> anonymized_patrons and anonymized_transactions tables.
> 
> What do you think?
> I am going to start the implementation very soon in order to plan an
> integration early in the 20.05 dev cycle.
> 
> Regards,
> Jonathan
> 
> [1] https://en.wikipedia.org/wiki/Pseudonymization
> _______________________________________________
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> https://lists.katipo.co.nz/mailman/listinfo/koha



More information about the Koha mailing list