TY - JOUR
T1 - DiTaxa
T2 - Nucleotide-pair encoding of 16S rRNA for host phenotype and biomarker detection
AU - Asgari, Ehsaneddin
AU - Münch, Philipp C.
AU - Lesker, Till R.
AU - McHardy, Alice C.
AU - Mofrad, Mohammad R.K.
N1 - Publisher Copyright:
© 2019 The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].
PY - 2019/7/15
Y1 - 2019/7/15
N2 - Summary: Identifying distinctive taxa for micro-biome-related diseases is considered key to the establishment of diagnosis and therapy options in precision medicine and imposes high demands on the accuracy of micro-biome analysis techniques. We propose an alignment- and reference- free subsequence based 16S rRNA data analysis, as a new paradigm for micro-biome phenotype and biomarker detection. Our method, called DiTaxa, substitutes standard operational taxonomic unit (OTU)-clustering by segmenting 16S rRNA reads into the most frequent variable-length subsequences. We compared the performance of DiTaxa to the state-of-the-art methods in phenotype and biomarker detection, using human-associated 16S rRNA samples for periodontal disease, rheumatoid arthritis and inflammatory bowel diseases, as well as a synthetic benchmark dataset. DiTaxa performed competitively to the k-mer based state-of-the-art approach in phenotype prediction while outperforming the OTU-based state-of-the-art approach in finding biomarkers in both resolution and coverage evaluated over known links from literature and synthetic benchmark datasets.
AB - Summary: Identifying distinctive taxa for micro-biome-related diseases is considered key to the establishment of diagnosis and therapy options in precision medicine and imposes high demands on the accuracy of micro-biome analysis techniques. We propose an alignment- and reference- free subsequence based 16S rRNA data analysis, as a new paradigm for micro-biome phenotype and biomarker detection. Our method, called DiTaxa, substitutes standard operational taxonomic unit (OTU)-clustering by segmenting 16S rRNA reads into the most frequent variable-length subsequences. We compared the performance of DiTaxa to the state-of-the-art methods in phenotype and biomarker detection, using human-associated 16S rRNA samples for periodontal disease, rheumatoid arthritis and inflammatory bowel diseases, as well as a synthetic benchmark dataset. DiTaxa performed competitively to the k-mer based state-of-the-art approach in phenotype prediction while outperforming the OTU-based state-of-the-art approach in finding biomarkers in both resolution and coverage evaluated over known links from literature and synthetic benchmark datasets.
UR - http://www.scopus.com/inward/record.url?scp=85068917303&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bty954
DO - 10.1093/bioinformatics/bty954
M3 - Article
C2 - 30500871
AN - SCOPUS:85068917303
SN - 1367-4803
VL - 35
SP - 2498
EP - 2500
JO - Bioinformatics
JF - Bioinformatics
IS - 14
M1 - bty954
ER -