DiTaxa: Nucleotide-pair encoding of 16S rRNA for host phenotype and biomarker detection

Ehsaneddin Asgari, Philipp C. Münch, Till R. Lesker, Alice C. McHardy, Mohammad R.K. Mofrad*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

Summary: Identifying distinctive taxa for micro-biome-related diseases is considered key to the establishment of diagnosis and therapy options in precision medicine and imposes high demands on the accuracy of micro-biome analysis techniques. We propose an alignment- and reference- free subsequence based 16S rRNA data analysis, as a new paradigm for micro-biome phenotype and biomarker detection. Our method, called DiTaxa, substitutes standard operational taxonomic unit (OTU)-clustering by segmenting 16S rRNA reads into the most frequent variable-length subsequences. We compared the performance of DiTaxa to the state-of-the-art methods in phenotype and biomarker detection, using human-associated 16S rRNA samples for periodontal disease, rheumatoid arthritis and inflammatory bowel diseases, as well as a synthetic benchmark dataset. DiTaxa performed competitively to the k-mer based state-of-the-art approach in phenotype prediction while outperforming the OTU-based state-of-the-art approach in finding biomarkers in both resolution and coverage evaluated over known links from literature and synthetic benchmark datasets.

Original languageEnglish
Article numberbty954
Pages (from-to)2498-2500
Number of pages3
JournalBioinformatics
Volume35
Issue number14
DOIs
Publication statusPublished - 15 Jul 2019
Externally publishedYes

Fingerprint

Dive into the research topics of 'DiTaxa: Nucleotide-pair encoding of 16S rRNA for host phenotype and biomarker detection'. Together they form a unique fingerprint.

Cite this