Infectious diseases are still among the major and prevalent health problems, mostly because of the drug resistance of novel variants of pathogens. Molecular interactions between pathogens and their hosts are the key parts of the infection mechanisms. Novel antimicrobial therapeutics to fight drug resistance is only possible in case of a thorough understanding of pathogen-host interaction (PHI) systems. Existing databases, which contain experimentally verified PHI data, suffer from scarcity of reported interactions due to the technically challenging and time consuming process of experiments. These have motivated many researchers to address the problem by proposing computational approaches for analysis and prediction of PHIs. The computational methods primarily utilize sequence information, protein structure and known interactions. Classic machine learning techniques are used when there are sufficient known interactions to be used as training data. On the opposite case, transfer and multitask learning methods are preferred. Here, we present an overview of these computational approaches for predicting PHI systems, discussing their weakness and abilities, with future directions.
Pathogenic microorganisms exploit host cellular mechanisms and evade host defense mechanisms through molecular pathogen-host interactions (PHIs). Therefore, comprehensive analysis of these PHI networks should be an initial step for developing effective therapeutics against infectious diseases. Computational prediction of PHI data is gaining increasing demand because of scarcity of experimental data. Prediction of protein-protein interactions (PPIs) within PHI systems can be formulated as a classification problem, which requires the knowledge of non-interacting protein pairs. This is a restricting requirement since we lack datasets that report non-interacting protein pairs. In this study, we formulated the "computational prediction of PHI data" problem using kernel embedding of heterogeneous data. This eliminates the abovementioned requirement and enables us to predict new interactions without randomly labeling protein pairs as non-interacting. Domain-domain associations are used to filter the predicted results leading to 175 novel PHIs between 170 human proteins and 105 viral proteins. To compare our results with the state-of-the-art studies that use a binary classification formulation, we modified our settings to consider the same formulation. Detailed evaluations are conducted and our results provide more than 10 percent improvements for accuracy and AUC (area under the receiving operating curve) results in comparison with state-of-the-art methods.
Rapid diagnosis of COVID-19 with high reliability is essential in the early stages. To this end, recent research often uses medical imaging combined with machine vision methods to diagnose COVID-19. However, the scarcity of medical images and the inherent differences in existing datasets that arise from different medical imaging tools, methods, and specialists may affect the generalization of machine learning-based methods. Also, most of these methods are trained and tested on the same dataset, reducing the generalizability and causing low reliability of the obtained model in real-world applications. This paper introduces an adversarial deep domain adaptation-based approach for diagnosing COVID-19 from lung CT scan images, termed ADA-COVID. Domain adaptation-based training process receives multiple datasets with different input domains to generate domain-invariant representations for medical images. Also, due to the excessive structural similarity of medical images compared to other image data in machine vision tasks, we use the triplet loss function to generate similar representations for samples of the same class (infected cases). The performance of ADA-COVID is evaluated and compared with other state-of-the-art COVID-19 diagnosis algorithms. The obtained results indicate that ADA-COVID achieves classification improvements of at least 3%, 20%, 20%, and 11% in accuracy, precision, recall, and F1 score, respectively, compared to the best results of competitors, even without directly training on the same data. The implementation source code of the ADA-COVID is publicly available at https://github.com/MehradAria/ADA-COVID.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.