Dagger: A Data (not code) Debugger

El Kindi Rezig, Lei Cao, Giovanni Simonini, Maxime Schoemans, Samuel Madden, Mourad Ouzzani, Nan Tang, Michael Stonebraker

Research output: Contribution to conferencePaperpeer-review

15 Citations (Scopus)

Abstract

With the democratization of data science libraries and frameworks, most data scientists manage and generate their data analytics pipelines using a collection of scripts (e.g., Python, R). This marks a shift from traditional applications that communicate back and forth with a DBMS that stores and manages the application data. While code debuggers have reached impressive maturity over the past decades, they fall short in assisting users to explore data-driven what-if scenarios (e.g., split the training set into two and build two ML models). Those scenarios, while doable programmatically, are a substantial burden for users to manage themselves. Dagger (Data Debugger) is an end-to-end data debugger that abstracts key data-centric primitives to enable users to quickly identify and mitigate data-related problems in a given pipeline. Dagger was motivated by a series of interviews we conducted with data scientists across several organizations. A preliminary version of Dagger has been incorporated into Data Civilizer 2.0 to help physicians at the Massachusetts General Hospital process complex pipelines.

Original languageEnglish
Publication statusPublished - 2020
Event10th Annual Conference on Innovative Data Systems Research, CIDR 2020 - Amsterdam, Netherlands
Duration: 12 Jan 202015 Jan 2020

Conference

Conference10th Annual Conference on Innovative Data Systems Research, CIDR 2020
Country/TerritoryNetherlands
CityAmsterdam
Period12/01/2015/01/20

Fingerprint

Dive into the research topics of 'Dagger: A Data (not code) Debugger'. Together they form a unique fingerprint.

Cite this