July 24, Tuesday
12:00 – 13:00
Limiting Disclosure of Sensitive Data in Sequential Releases of Databases
Computer Science seminar
Lecturer : Erez Shmueli
Affiliation : Deutsche Telekom Laboratories, BGU
Location : 202/37
Host : Dr. Aryeh Kontorovich
Privacy Preserving Data Publishing (PPDP) is a research field that
deals with the development of methods to enable publishing of data
while minimizing distortion, for maintaining usability on one hand,
and respecting privacy on the other hand.
Sequential release is a scenario of data publishing where multiple
releases of the same underlying table are published over a period of time.
A violation of privacy, in this case, may emerge from any one of the
releases, or as a result of joining information from different releases.
Similarly to [Wang and Fung 2006], our privacy definitions limit the
ability of an adversary who combines information from all releases, to
link values of the quasi-identifiers to sensitive values.
We extend the framework that was considered in [Wang and Fung 2006] in
three ways: We allow a greater number of releases, we consider the
more flexible local recoding model of ``cell generalization" (as
opposed to the global recoding model of ``cut generalization" in [Wang
and Fung 2006]), and we include the case where records may be added to
the underlying table from time to time.
Our extension of the framework requires also to modify the manner in
which privacy is evaluated.
We show that while [Wang and Fung 2006] based their privacy evaluation
on the notion of the Match Join between the releases, it is no longer
suitable for the extended framework considered here.
We define more restrictive types of join between the published
releases (the Full Match Join and the Kernel Match Join) that are more
suitable for privacy evaluation in this context. We then present a
top-down algorithm for anonymizing sequential releases in the cell
generalization model, that is based on our modified privacy
evaluations.
Our theoretical study is followed by experimentation that demonstrates
a staggering improvement in terms of utility due to the adoption of
the cell generalization model, and exemplifies the correction in the
privacy evaluation as offered by using the Full or Kernel Match Joins
instead of the Match Join.