GDPR - Statistics and anonymization - Koha

21 Nov 2019

      Hello everybody,

I have been contracted by KohaLa to work on some GDPR requirements.
The main idea is to "anonymize" patron's data but letting the library
access the transactions' statistics.

I am going to present you what I am planning to implement, in order to
collect ideas and answers.

There are the following steps I have in mind:
1. Pseudonymization [1] of patron's data
2. Improve deletion of patron related date (tables statistics,
old_reserves, deletedborrowers)
3. Add the ability to remove data that have been pseudonymized

I see 2 ways to achieve point 1:
* We create 2 tables, 1 for the patrons, 1 for the transactions.
- borrowers_anonymized will contain: hash_id, has_cardnumber,
branchcode, creation_date, categorycode, bsort1, bsort2,
[borrower_attributes]
- transaction_anonymized will contain: hash_id, transaction_type,
branchcode, itemnumber, holdingbranch, location, itemcallnumber,
itemtype, timestamp

hash_id will be generated using the borrowernumber and a key (that
will be stored on the server, path in koha-conf)

Pros: Easier to understand and manipulate as it follows existing structure.
We track patron's modifications (this is the most important part)
Cons: tech part: new config, a new path have to be created (minor)

* We create only 1 table, (nosql-like). It will contain the same data
as previously, without the hash_id

Pros: No new config. Data are never updated and we have the values
when the transactions has been processed.
Cons: Data are not updated :)

About borrower_attributes, the initial specification asks for 2
attributes defined in a syspref. I think it should be configurable,
with a join table (Pro: more flexible, Con: SQL requests more complex)

I think we should have the 2 tables and keep a link between the
anonymized_patrons and anonymized_transactions tables.

What do you think?
I am going to start the implementation very soon in order to plan an
integration early in the 20.05 dev cycle.

Regards,
Jonathan

[1] https://en.wikipedia.org/wiki/Pseudonymization

GDPR - Statistics and anonymization

Jonathan Druart

Mike D.

Jonathan Druart

asakovich＠hmcpl.org

Jonathan Druart

Jonathan Druart

tags

participants (3)