TY - GEN
T1 - Proactive annotation management in relational databases
AU - Ibrahim, Karim
AU - Du, Xiao
AU - Eltabakh, Mohamed Y.
N1 - Publisher Copyright:
Copyright © 2015 ACM.
PY - 2015/5/27
Y1 - 2015/5/27
N2 - Annotation management and data curation has been extensively studied in the context of relational databases. However, existing annotation management techniques share a common limitation, which is that they are all passive engines, i.e., they only manage the annotations obtained from external sources such as DB admins, domain experts, and curation tools. They neither learn from the available annotations nor exploit the annotations-to-data correlations to further enhance the quality of the annotated database. Delegating such crucial and complex tasks to end-users-especially under largescale databases and annotation sets-is clearly the wrong choice. In this paper, we propose the Nebula system, an advanced and proactive annotation management engine in relational databases. Nebula complements the state-of-art techniques in annotation management by learning from the available annotations, analyzing their content and semantics, and understanding their correlations with the data. And then, Nebula proactively discovers and recommends potentially missing annotation-to-data attachments. We propose context-aware ranking and prioritization of the discovered attachments that take into account the relationships among the data tuples and their annotations. We also propose approximation techniques and expert-enabled verification mechanisms that adaptively maintain high-accuracy predictions while minimizing the experts' involvement. Nebula is realized on top of an existing annotation management engine, and experimentally evaluated to illustrate the effectiveness of the proposed techniques, and to demonstrate the potential gain in enhancing the quality of annotated databases.
AB - Annotation management and data curation has been extensively studied in the context of relational databases. However, existing annotation management techniques share a common limitation, which is that they are all passive engines, i.e., they only manage the annotations obtained from external sources such as DB admins, domain experts, and curation tools. They neither learn from the available annotations nor exploit the annotations-to-data correlations to further enhance the quality of the annotated database. Delegating such crucial and complex tasks to end-users-especially under largescale databases and annotation sets-is clearly the wrong choice. In this paper, we propose the Nebula system, an advanced and proactive annotation management engine in relational databases. Nebula complements the state-of-art techniques in annotation management by learning from the available annotations, analyzing their content and semantics, and understanding their correlations with the data. And then, Nebula proactively discovers and recommends potentially missing annotation-to-data attachments. We propose context-aware ranking and prioritization of the discovered attachments that take into account the relationships among the data tuples and their annotations. We also propose approximation techniques and expert-enabled verification mechanisms that adaptively maintain high-accuracy predictions while minimizing the experts' involvement. Nebula is realized on top of an existing annotation management engine, and experimentally evaluated to illustrate the effectiveness of the proposed techniques, and to demonstrate the potential gain in enhancing the quality of annotated databases.
KW - Annotated database
KW - Keyword search
KW - Proactive annotation management
UR - http://www.scopus.com/inward/record.url?scp=84957567649&partnerID=8YFLogxK
U2 - 10.1145/2723372.2749435
DO - 10.1145/2723372.2749435
M3 - Conference contribution
AN - SCOPUS:84957567649
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 2017
EP - 2030
BT - SIGMOD 2015 - Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
T2 - ACM SIGMOD International Conference on Management of Data, SIGMOD 2015
Y2 - 31 May 2015 through 4 June 2015
ER -