Abstract
High-throughput RNA sequencing technologies (RNA-Seq) have recently started being used as a tool for helping diagnose rare genetic disorders, as they can indicate abnormal gene expression counts — a telltale sign of genetic pathology. Existing solutions either require a large number of samples or do not provide proper statistical significance testing. We present a Bayesian model (OutPyR) for identifying abnormal RNA-Seq gene expression counts in datasets, particularly those with a small number of samples. The model incorporates recently introduced data-augmentation techniques to efficiently and accurately infer parameters of the underlying negative binomial process, while also assessing the uncertainty of the inference, and giving the possibility to generate simulated data. The model's software implementation is object oriented and thus easily extensible, provides parameter-trace exploration, fault-tolerance and recovery during the parameter estimation process. We also develop a p-value based outlier score that naturally stems from our model. We apply the model to real and simulated datasets, for different organisms and tissues, and present comparisons with existing models. Our model is implemented purely in Python and its standalone source code is available at https://github.com/esalkovic/outpyr.
Original language | English |
---|---|
Article number | 101245 |
Journal | Journal of Computational Science |
Volume | 47 |
DOIs | |
Publication status | Published - Nov 2020 |
Keywords
- Bayesian modeling
- Outlier detection
- RNA-Seq