This paper addresses the development of predictive models for distinguishing pre-symptomatic infections from uninfected individuals. Our machine learning experiments are conducted on publicly available challenge studies that collected whole-blood transcriptomics data from individuals infected with HRV, RSV, H1N1, and H3N2. We address the problem of identifying discriminatory biomarkers between controls and eventual shedders in the first 32 h post-infection. Our exploratory analysis shows that the most discriminatory biomarkers exhibit a strong dependence on time over the course of the human response to infection. We visualize the feature sets to provide evidence of the rapid evolution of the gene expression profiles. To quantify this observation, we partition the data in the first 32 h into four equal time windows of 8 h each and identify all discriminatory biomarkers using sparsity-promoting classifiers and Iterated Feature Removal. We then perform a comparative machine learning classification analysis using linear support vector machines, artificial neural networks and Centroid-Encoder. We present a range of experiments on different groupings of the diseases to demonstrate the robustness of the resulting models.
We conduct machine learning experiments on time-dependent gene expression measurements associated with the immune response to influenza in humans. We employ three partitions of the two data sets focusing on H1N1 only, H3N2 only and H1N1 and H3N2 combined. From a total set of 1439 known biological pathways, we identify the most discriminatory, potentially capable of providing a very early prognosis of infection, focusing on the time period t ≤ 29 hours post infection. We apply a suite of different machine learning algorithms to these partitions including linear, nonlinear, and sparse support vector machines. In addition, we use artificial neural networks (ANN), k-nearest neighbors and classification on Grassmann manifolds. The cAMP Signaling pathway and the genes PAPSS1 and PAPSS2 appeared to play central role in the very early prognosis problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.