Record Linkage Based on Entities' Behavior

Mohamed Yakout, Ahmed Khalifa Elmagarmid, Hazen Elmeleegy, Mourad Ouzzani

Research output: Book/ReportCommissioned reportpeer-review

Abstract

Record linkage is the problem of identifying similar records across different data sources. Traditional record linkage techniques focus on using simple database attributes in a textual similarity comparison to decide on matched and non-matched records. Recently, record linkage techniques have considered useful extracted knowledge and domain information to help
enhancing the matching accuracy. In this paper, we present a new technique for record linkage that is based on entity's behavior, which can be extracted from a transaction log. In the matching process, we measure the improvement of identifying a behavior when comparing two entities by merging their transaction log. To do so, we use two matching phases; first, a candidate generation phase, which is fast and provide almost no false negatives, while producing low precision. Second, an accurate matching phase, which enhances the precision of the matching at high run time cost. In the candidates phase generation, behavior is represented by points in the complex plan, where we perform approximate evaluations. In the accurate matching phase, we use a heuristic called compressibility, where identified behaviors are more compressible. Our experiments show that the proposed technique can be used to enhance the record linkage quality while being practical for large logs. We also perform extensive sensitivity analysis for the technique's accuracy and performance.
Original languageEnglish
Publication statusPublished - 2008
Externally publishedYes

Fingerprint

Dive into the research topics of 'Record Linkage Based on Entities' Behavior'. Together they form a unique fingerprint.

Cite this