Purpose: Evolving technologies allow us to measure human molecular data in a wide reach. Those data are extensively used by researchers in many studies and help in advancements of medical field. Transcriptome, proteome, metabolome, and epigenome are few such molecular data. This study utilizes the transcriptome data of COVID-19 patients to uncover the dysregulated genes in the SARS-COV-2.
Method: Selected genes are used in machine learning models to predict various phenotypes of those patients. Ten different phenotypes are studied here such as time since onset, COVID-19 status, connection between age and COVID-19, hospitalization status and ICU status, using classification models. Further, this study compares molecular characterization of COVID-19 patients with other respiratory diseases.
Results: Gene ontology analysis on the selected features shows that they are highly related to viral infection. Features are selected using two methods and selected features are individually used in the classification of patients using six different machine learning algorithms. For each of the selected phenotype, results are compared to find the best prediction model.
Conclusion: Even though, there are not any significant differences between the feature selection methods, random forest and SVM performs very well throughout all the phenotype studies.
We study classification and regression problems in lung tumours where high throughput gene expression is measured at multiple levels: epi-genetics, trancription and protein. We uncover the correlates of smoking and gender-specificity in lung tumors. Different genes are indicative of smoking levels, gender and survival rates at these different levels. We also carry out an integrative anaysis, by feature selection from the pool of all three levels of features. Our results show that the epigenetic information in DNA methylation is a better marker for smoking status than gene expression either at the transcript or protein levels. Further, surprisingly, integrative anlysis using multi-level gene expression offers no significant advantage over the individual levels in the classification and survival prediction problems considered.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.