TY - JOUR
T1 - Data civilizer 2.0
T2 - 45th International Conference on Very Large Data Bases, VLDB 2019
AU - Rezig, El Kindi
AU - Cao, Lei
AU - Stonebraker, Michael
AU - Simonini, Giovanni
AU - Tao, Wenbo
AU - Madden, Samuel
AU - Ouzzani, Mourad
AU - Tang, Nan
AU - Elmagarmid, Ahmed K.
N1 - Publisher Copyright:
© 2019 VLDB Endowment.
PY - 2018
Y1 - 2018
N2 - Data scientists spend over 80% of their time (1) parameter-tuning machine learning models and (2) iterating between data cleaning and machine learning model execution. While there are existing efforts to support the first requirement, there is currently no integrated workflow system that couples data cleaning and machine learning development. The previous version of Data Civilizer was geared towards data cleaning and discovery using a set of pre-defined tools. In this paper, we introduce Data Civilizer 2.0, an end-to-end workflow system satisfying both requirements. In addition, this system also supports a sophisticated data debugger and a workflow visualization system. In this demo, we will show how we used Data Civilizer 2.0 to help scientists at the Massachusetts General Hospital build their cleaning and machine learning pipeline on their 30TB brain activity dataset.
AB - Data scientists spend over 80% of their time (1) parameter-tuning machine learning models and (2) iterating between data cleaning and machine learning model execution. While there are existing efforts to support the first requirement, there is currently no integrated workflow system that couples data cleaning and machine learning development. The previous version of Data Civilizer was geared towards data cleaning and discovery using a set of pre-defined tools. In this paper, we introduce Data Civilizer 2.0, an end-to-end workflow system satisfying both requirements. In addition, this system also supports a sophisticated data debugger and a workflow visualization system. In this demo, we will show how we used Data Civilizer 2.0 to help scientists at the Massachusetts General Hospital build their cleaning and machine learning pipeline on their 30TB brain activity dataset.
UR - http://www.scopus.com/inward/record.url?scp=85074539229&partnerID=8YFLogxK
U2 - 10.14778/3352063.3352108
DO - 10.14778/3352063.3352108
M3 - Conference article
AN - SCOPUS:85074539229
SN - 2150-8097
VL - 12
SP - 1954
EP - 1957
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 12
Y2 - 26 August 2017 through 30 August 2017
ER -