TY - GEN
T1 - Hybrid differentially-private string matching
AU - Rao, Fang Yu
AU - Ghinita, Gabriel
AU - Bertino, Elisa
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/19
Y1 - 2018/7/19
N2 - Private record linkage enables multiple parties to exchange records of similar entities without disclosing their own data sets. Several techniques solve private matching for numerical attributes, but they cannot handle well string-typed data. We propose a protocol for differentially-private linkage of records with string-valued attributes. String-valued data are partitioned into groups, and each partition is represented by a synopsis. Strings are first embedded into a metric space which preserves the relative distance among them, and then they are translated to multi-dimensional points. We propose two partitioning algorithms: the first one uses solely Lipschitz embedding coordinates, whereas the second one also leverages string length information. To the best of our knowledge, this is the first efficient and provably secure solution to the private string matching problem. Extensive experimental results on a real dataset demonstrate that our solution is efficient and accurate. We also show that our solution can be adapted to alternative protection models, e.g., output-constrained differential privacy.
AB - Private record linkage enables multiple parties to exchange records of similar entities without disclosing their own data sets. Several techniques solve private matching for numerical attributes, but they cannot handle well string-typed data. We propose a protocol for differentially-private linkage of records with string-valued attributes. String-valued data are partitioned into groups, and each partition is represented by a synopsis. Strings are first embedded into a metric space which preserves the relative distance among them, and then they are translated to multi-dimensional points. We propose two partitioning algorithms: the first one uses solely Lipschitz embedding coordinates, whereas the second one also leverages string length information. To the best of our knowledge, this is the first efficient and provably secure solution to the private string matching problem. Extensive experimental results on a real dataset demonstrate that our solution is efficient and accurate. We also show that our solution can be adapted to alternative protection models, e.g., output-constrained differential privacy.
KW - Differential Privacy
KW - Lipschitz Embedding
KW - String Matching
UR - http://www.scopus.com/inward/record.url?scp=85050997700&partnerID=8YFLogxK
U2 - 10.1109/ICDCS.2018.00054
DO - 10.1109/ICDCS.2018.00054
M3 - Conference contribution
AN - SCOPUS:85050997700
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 478
EP - 488
BT - Proceedings - 2018 IEEE 38th International Conference on Distributed Computing Systems, ICDCS 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 38th IEEE International Conference on Distributed Computing Systems, ICDCS 2018
Y2 - 2 July 2018 through 5 July 2018
ER -