Data readiness report

Shazia Afzal, C. Rajmohan, Manish Kesarwani, Sameep Mehta, Hima Patel

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Citations (Scopus)

Abstract

Data exploration and quality analysis is an important yet tedious process in the AI pipeline. Current data cleaning and data readiness assessment practices for machine learning tasks are mostly conducted in an arbitrary manner which limits their reuse and often results in loss of productivity. We introduce the concept of a Data Readiness Report as accompanying documentation to a dataset that allows data consumers to get detailed insights into the quality of data. Data characteristics and challenges on various quality dimensions are identified and documented, keeping in mind the principles of transparency and explainability. The Data Readiness Report also serves as a record of all data assessment operations, including applied transformations. This provides a detailed lineage for data governance and management. In effect, the report captures and documents the actions taken by various personas in a data readiness and assessment workflow. Over time this becomes a repository of best practices and can potentially drive a recommendation system for building automated data readiness workflows on the lines of AutoML [1]. The data readiness report could serve as a valuable asset for organizing and operationalizing data in a Data-as-a-service model as it augments the trust and reliability of the datasets. We anticipate that together with the Datasheets [2], Dataset Nutrition Label [3], FactSheets [4] and Model Cards [5], the Data Readiness Report completes the AI documentation pipeline and increases trust and re-useability of data.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE International Conference on Smart Data Services, SMDS 2021
EditorsNimanthi Atukorala, Carl K. Chang, Ernesto Damiani, Min Fu Lizhi, George Spanoudakis, Mudhakar Srivatsa, Zhongjie Wang, Jia Zhang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages42-51
Number of pages10
ISBN (Electronic)9781665400589
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event2021 IEEE International Conference on Smart Data Services, SMDS 2021 - Virtual, Online, United States
Duration: 5 Sept 202111 Sept 2021

Publication series

NameProceedings - 2021 IEEE International Conference on Smart Data Services, SMDS 2021

Conference

Conference2021 IEEE International Conference on Smart Data Services, SMDS 2021
Country/TerritoryUnited States
CityVirtual, Online
Period5/09/2111/09/21

Keywords

  • Data assurance and trust
  • Data documentation
  • Data quality
  • Governance
  • Machine learning datasets

Fingerprint

Dive into the research topics of 'Data readiness report'. Together they form a unique fingerprint.

Cite this