TY - GEN
T1 - Distributed Cardinality Estimation of Set Operations with Differential Privacy
AU - Stanojevic, Rade
AU - Nabeel, Mohamed
AU - Yu, Ting
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/4
Y1 - 2017/12/4
N2 - In this paper we study the problem of estimating the cardinality of pairwise set operations (union and intersection) over sets possessed by different data owners, while preserving differential privacy. In our problem setting, a data owner could only communicate with an untrusted server, and thus have to perturb its set data for privacy protection before sharing them with the server. This problem setting is relevant to diverse applications in practice, including sensor-based traffic monitoring, cross-domain data integration, and combining findings from multiple surveys. To tackle this problem, we first adopt existing randomized response technique to perturb the bit vector (to achieve differential privacy) and develop tools which the server can use to derive the cardinality of set operations from the randomized bit vectors. However, the variance of the union/intersection estimator grows linearly with the universe (bit-vector) size which is impractical for large universes. To keep the variance low we in addition propose to resort to Bloom filters instead of high-dimensional bit vectors to share set data with the server. The key insight is that in spite of inevitable collisions in BF by keeping its size small we can bound the variance of the union/intersection cardinality estimators. Finally, we show that investing a small part of the privacy budget into reporting (obfuscated) set cardinality can further reduce the estimator errors for up to 20%. Our empirical analysis reveals the impact of various parameters including privacy budget and Bloom filter size on the overall accuracy of the approach and demonstrates the utility of the proposed solution.
AB - In this paper we study the problem of estimating the cardinality of pairwise set operations (union and intersection) over sets possessed by different data owners, while preserving differential privacy. In our problem setting, a data owner could only communicate with an untrusted server, and thus have to perturb its set data for privacy protection before sharing them with the server. This problem setting is relevant to diverse applications in practice, including sensor-based traffic monitoring, cross-domain data integration, and combining findings from multiple surveys. To tackle this problem, we first adopt existing randomized response technique to perturb the bit vector (to achieve differential privacy) and develop tools which the server can use to derive the cardinality of set operations from the randomized bit vectors. However, the variance of the union/intersection estimator grows linearly with the universe (bit-vector) size which is impractical for large universes. To keep the variance low we in addition propose to resort to Bloom filters instead of high-dimensional bit vectors to share set data with the server. The key insight is that in spite of inevitable collisions in BF by keeping its size small we can bound the variance of the union/intersection cardinality estimators. Finally, we show that investing a small part of the privacy budget into reporting (obfuscated) set cardinality can further reduce the estimator errors for up to 20%. Our empirical analysis reveals the impact of various parameters including privacy budget and Bloom filter size on the overall accuracy of the approach and demonstrates the utility of the proposed solution.
UR - http://www.scopus.com/inward/record.url?scp=85046546726&partnerID=8YFLogxK
U2 - 10.1109/PAC.2017.43
DO - 10.1109/PAC.2017.43
M3 - Conference contribution
AN - SCOPUS:85046546726
T3 - Proceedings - 2017 IEEE Symposium on Privacy-Aware Computing, PAC 2017
SP - 37
EP - 48
BT - Proceedings - 2017 IEEE Symposium on Privacy-Aware Computing, PAC 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st IEEE Symposium on Privacy-Aware Computing, PAC 2017
Y2 - 1 August 2017 through 3 August 2017
ER -