TY - GEN
T1 - Private record matching using differential privacy
AU - Inan, Ali
AU - Kantarcioglu, Murat
AU - Ghinita, Gabriel
AU - Bertino, Elisa
PY - 2010
Y1 - 2010
N2 - Private matching between datasets owned by distinct parties is a challenging problem with several applications. Private matching allows two parties to identify the records that are close to each other according to some distance functions, such that no additional information other than the join result is disclosed to any party. Private matching can be solved securely and accurately using secure multi-party computation (SMC) techniques, but such an approach is prohibitively expensive in practice. Previous work proposed the release of sanitized versions of the sensitive datasets which allows blocking, i.e., filtering out sub-sets of records that cannot be part of the join result. This way, SMC is applied only to a small fraction of record pairs, reducing the matching cost to acceptable levels. The blocking step is essential for the privacy, accuracy and efficiency of matching. However, the state-of-the-art focuses on sanitization based on k-anonymity, which does not provide sufficient privacy. We propose an alternative design centered on differential privacy, a novel paradigm that provides strong privacy guarantees. The realization of the new model presents difficult challenges, such as the evaluation of distance-based matching conditions with the help of only a statistical queries interface. Specialized versions of data indexing structures (e.g., kd-trees) also need to be devised, in order to comply with differential privacy. Experiments conducted on the real-world Census-income dataset show that, although our methods provide strong privacy, their effectiveness in reducing matching cost is not far from that of k-anonymity based counterparts.
AB - Private matching between datasets owned by distinct parties is a challenging problem with several applications. Private matching allows two parties to identify the records that are close to each other according to some distance functions, such that no additional information other than the join result is disclosed to any party. Private matching can be solved securely and accurately using secure multi-party computation (SMC) techniques, but such an approach is prohibitively expensive in practice. Previous work proposed the release of sanitized versions of the sensitive datasets which allows blocking, i.e., filtering out sub-sets of records that cannot be part of the join result. This way, SMC is applied only to a small fraction of record pairs, reducing the matching cost to acceptable levels. The blocking step is essential for the privacy, accuracy and efficiency of matching. However, the state-of-the-art focuses on sanitization based on k-anonymity, which does not provide sufficient privacy. We propose an alternative design centered on differential privacy, a novel paradigm that provides strong privacy guarantees. The realization of the new model presents difficult challenges, such as the evaluation of distance-based matching conditions with the help of only a statistical queries interface. Specialized versions of data indexing structures (e.g., kd-trees) also need to be devised, in order to comply with differential privacy. Experiments conducted on the real-world Census-income dataset show that, although our methods provide strong privacy, their effectiveness in reducing matching cost is not far from that of k-anonymity based counterparts.
KW - Differential privacy
KW - Privacy
KW - Record matching
KW - Security
UR - http://www.scopus.com/inward/record.url?scp=77952279809&partnerID=8YFLogxK
U2 - 10.1145/1739041.1739059
DO - 10.1145/1739041.1739059
M3 - Conference contribution
AN - SCOPUS:77952279809
SN - 9781605589459
T3 - Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings
SP - 123
EP - 134
BT - Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, Proceedings
T2 - 13th International Conference on Extending Database Technology: Advances in Database Technology - EDBT 2010
Y2 - 22 March 2010 through 26 March 2010
ER -