TY - JOUR
T1 - Debugging Large-Scale Data Science Pipelines using Dagger
AU - Rezig, El Kindi
AU - Brahmaroutu, Ashrita
AU - Tatbul, Nesime
AU - Ouzzani, Mourad
AU - Tang, Nan
AU - Mattson, Timothy
AU - Madden, Samuel
AU - Stonebraker, Michael
N1 - Publisher Copyright:
© VLDB Endowment. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Data pipelines are the new code. Consequently, data scientists need new tools to support the often time-consuming process of debugging their pipelines. We introduce Dagger, an end-to-end system to debug and mitigate data-centric errors in data pipelines, such as a data transformation gone wrong or a classifier underperforming due to noisy training data. Dagger supports inter-module debugging, where the pipeline blocks are treated as black boxes, as well as intra-module debugging, where users can debug data objects in Python scripts (e.g., DataFrames). In this demo, we will walk the audience through a rich, real-world business intelligence use case from our industrial collaborators at Intel, to highlight how Dagger enables data scientists to productively identify and mitigate data-centric problems at different stages of pipeline development.
AB - Data pipelines are the new code. Consequently, data scientists need new tools to support the often time-consuming process of debugging their pipelines. We introduce Dagger, an end-to-end system to debug and mitigate data-centric errors in data pipelines, such as a data transformation gone wrong or a classifier underperforming due to noisy training data. Dagger supports inter-module debugging, where the pipeline blocks are treated as black boxes, as well as intra-module debugging, where users can debug data objects in Python scripts (e.g., DataFrames). In this demo, we will walk the audience through a rich, real-world business intelligence use case from our industrial collaborators at Intel, to highlight how Dagger enables data scientists to productively identify and mitigate data-centric problems at different stages of pipeline development.
UR - http://www.scopus.com/inward/record.url?scp=85108210562&partnerID=8YFLogxK
U2 - 10.14778/3415478.3415527
DO - 10.14778/3415478.3415527
M3 - Article
AN - SCOPUS:85108210562
SN - 2150-8097
VL - 13
SP - 2993
EP - 2996
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 12
ER -