On the Accuracy and Completeness of the Record Matching Process

Vassilios Verykios, Mohamed G. Elfeky, Ahmed Khalifa Elmagarmid, Munir Cochinwala, Sid Dalal

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Record matching or linking is one of the phases of the data quality improvement process, in which, records from different sources, are cleansed and integrated in a centralized data store to be used for various purposes. Both, earlier and recent studies in data quality and record linkage focus on various statistical models, which make strong assumptions on the probabilities of attribute errors. In this study, we evaluate different models for record linkage, which are built based on data only. We use a program that generates data with known error distributions and we train classification models, which we use to estimate the accuracy and the completeness of the record linking process. The results indicate that the automated learning techniques are adequate for this process and that both their accuracy and their completeness are comparable to the
accuracy and the completeness of other, mostly manual, processes
Original languageEnglish
Title of host publicationProceedings of the 2000 Conference on Information Quality
Number of pages16
Publication statusPublished - 2000
Externally publishedYes

Fingerprint

Dive into the research topics of 'On the Accuracy and Completeness of the Record Matching Process'. Together they form a unique fingerprint.

Cite this