Automating the approximate record-matching process

Vassilios S. Verykios, Ahmed K. Elmagarmid, Elias N. Houstis

Research output: Contribution to journalArticlepeer-review

66 Citations (Scopus)

Abstract

Data quality has many dimensions one of which is accuracy. Accuracy is usually compromised by errors accidentally or intensionally introduced in a database system. These errors result in inconsistent, incomplete, or erroneous data elements. For example, a small variation in the representation of a data object, produces a unique instantiation of the object being represented. In order to improve the accuracy of the data stored in a database system, we need to compare them either with real-world counter-parts or with other data stored in the same or a different system. In this paper, we address the problem of matching records which refer to the same entity by computing their similarity. Exact record matching has limited applicability in this context since even simple errors like character transpositions cannot be captured in the record-linking process. Our methodology deploys advanced data-mining techniques for dealing with the high computational and inferential complexity of approximate record matching.

Original languageEnglish
Pages (from-to)83-98
Number of pages16
JournalInformation Sciences
Volume126
Issue number1
DOIs
Publication statusPublished - Jul 2000
Externally publishedYes

Fingerprint

Dive into the research topics of 'Automating the approximate record-matching process'. Together they form a unique fingerprint.

Cite this