[Koha] GDPR - Statistics and anonymization

Jonathan Druart jonathan.druart at bugs.koha-community.org
Wed Dec 4 09:16:39 NZDT 2019


Hello,

A bit of fresh news, I have submitted a bunch of patches that is ready
to be tested.
The main bug report is bug 24151 (Add a pseudonymization process for
patrons and transactions)
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=24151
I pushed a remote branch with everything applied in the correct order,
on my gitlab repo:
https://gitlab.com/joubu/Koha/commits/bug_24151

Cheers,
Jonathan

Le jeu. 21 nov. 2019 à 17:13, Jonathan Druart
<jonathan.druart at bugs.koha-community.org> a écrit :
>
> Hello everybody,
>
> I have been contracted by KohaLa to work on some GDPR requirements.
> The main idea is to "anonymize" patron's data but letting the library
> access the transactions' statistics.
>
> I am going to present you what I am planning to implement, in order to
> collect ideas and answers.
>
> There are the following steps I have in mind:
> 1. Pseudonymization [1] of patron's data
> 2. Improve deletion of patron related date (tables statistics,
> old_reserves, deletedborrowers)
> 3. Add the ability to remove data that have been pseudonymized
>
> I see 2 ways to achieve point 1:
> * We create 2 tables, 1 for the patrons, 1 for the transactions.
> - borrowers_anonymized will contain: hash_id, has_cardnumber,
> branchcode, creation_date, categorycode, bsort1, bsort2,
> [borrower_attributes]
> - transaction_anonymized will contain: hash_id, transaction_type,
> branchcode, itemnumber, holdingbranch, location, itemcallnumber,
> itemtype, timestamp
>
> hash_id will be generated using the borrowernumber and a key (that
> will be stored on the server, path in koha-conf)
>
> Pros: Easier to understand and manipulate as it follows existing structure.
> We track patron's modifications (this is the most important part)
> Cons: tech part: new config, a new path have to be created (minor)
>
> * We create only 1 table, (nosql-like). It will contain the same data
> as previously, without the hash_id
>
> Pros: No new config. Data are never updated and we have the values
> when the transactions has been processed.
> Cons: Data are not updated :)
>
> About borrower_attributes, the initial specification asks for 2
> attributes defined in a syspref. I think it should be configurable,
> with a join table (Pro: more flexible, Con: SQL requests more complex)
>
> I think we should have the 2 tables and keep a link between the
> anonymized_patrons and anonymized_transactions tables.
>
> What do you think?
> I am going to start the implementation very soon in order to plan an
> integration early in the 20.05 dev cycle.
>
> Regards,
> Jonathan
>
> [1] https://en.wikipedia.org/wiki/Pseudonymization


More information about the Koha mailing list