TY - JOUR
T1 - The data analytics group at the Qatar Computing Research Institute
AU - Beskales, George
AU - Ilyas, Ihab F.
AU - Papotti, Paolo
AU - Das, Gautam
AU - Naumann, Felix
AU - Quiane-Ruiz, Jorge
AU - Elmagarmid, Ahmed K.
AU - Ouzzani, Mourad
AU - Tang, Nan
PY - 2013/1
Y1 - 2013/1
N2 - The Qatar Computing Research Institute (QCRI), a member of Qatar Foundation for Education, Science and Community Development, started its activities in early 2011. QCRI is focusing on tackling large-scale computing challenges that address national priorities for growth and development and that have global impact in computing research. DA@QCRI has built expertise focusing on three core data management challenges: extracting data from its natural digital habitat, integrating a large and evolving number of sources, and robust cleaning to assure data quality and validation. Cleaning data requires collecting and maintaining a massive amount of metadata, such as data violations, lineage of data changes, and possible data repairs. In addition, users need to understand better the current health of the data and the data cleaning process through summarization or samples of data errors before they can effectively guide any data cleaning process. Providing a scalable data cleaning solution requires efficient methods to generate, maintain, and access such metadata.
AB - The Qatar Computing Research Institute (QCRI), a member of Qatar Foundation for Education, Science and Community Development, started its activities in early 2011. QCRI is focusing on tackling large-scale computing challenges that address national priorities for growth and development and that have global impact in computing research. DA@QCRI has built expertise focusing on three core data management challenges: extracting data from its natural digital habitat, integrating a large and evolving number of sources, and robust cleaning to assure data quality and validation. Cleaning data requires collecting and maintaining a massive amount of metadata, such as data violations, lineage of data changes, and possible data repairs. In addition, users need to understand better the current health of the data and the data cleaning process through summarization or samples of data errors before they can effectively guide any data cleaning process. Providing a scalable data cleaning solution requires efficient methods to generate, maintain, and access such metadata.
UR - http://www.scopus.com/inward/record.url?scp=84872962050&partnerID=8YFLogxK
U2 - 10.1145/2430456.2430466
DO - 10.1145/2430456.2430466
M3 - Article
AN - SCOPUS:84872962050
SN - 0163-5808
VL - 41
SP - 33
EP - 38
JO - SIGMOD Record
JF - SIGMOD Record
IS - 4
ER -