A B S T R A C TStudies of host factors that affect susceptibility to viral infections have led to the possibility of determining the risk of emerging infections in potential host organisms. In this study, we constructed a computational framework to estimate the probability of virus transmission between potential hosts based on the hypothesis that the major barrier to virus infection is differences in cell-receptor sequences among species. Information regarding host susceptibility to virus infection was collected to classify the cross-species infection propensity between hosts. Evolutionary divergence matrices and a sequence similarity scoring program were used to determine the distance and similarity of receptor sequences. The discriminant analysis was validated with cross-validation methods. The results showed that the primary structure of the receptor protein influences host susceptibility to cross-species viral infections. Pair-wise distance, relative distance, and sequence similarity showed the best accuracy in identifying the susceptible group. Based on the results of the discriminant analysis, we constructed ViCIPR (http://lcbb3.snu.ac.kr/ViCIPR/home.jsp), a server-based tool to enable users to easily extract the crossspecies infection propensities of specific viruses using a simple two-step procedure. Our sequence-based approach suggests that it may be possible to identify virus transmission between hosts without requiring complex structural analysis. Due to a lack of available data, this method is limited to viruses whose receptor use has been determined. However, the significant accuracy of predictive variables that positively and negatively influence virus transmission suggests that this approach could be improved with further analysis of receptor sequences.
Background: The host tropism determinants of influenza virus, which cause changes in the host range and increase the likelihood of interaction with specific hosts, are critical for understanding the infection and propagation of the virus in diverse host species. Methods: Six types of protein sequences of influenza viral strains isolated from three classes of hosts (avian, human, and swine) were obtained. Random forest, naïve Bayes classification, and knearest neighbor algorithms were used for host classification. The Java language was used for sequence analysis programming and identifying host-specific position markers. Results: A machine learning technique was explored to derive the physicochemical properties of amino acids used in host classification and prediction. HA protein was found to play the most important role in determining host tropism of the influenza virus, and the random forest method yielded the highest accuracy in host prediction. Conserved amino acids that exhibited host-specific differences were also selected and verified, and they were found to be useful position markers for host classification. Finally, ANOVA analysis and post-hoc testing revealed that the physicochemical properties of amino acids, comprising protein sequences combined with position markers, differed significantly among hosts. Conclusion: The host tropism determinants and position markers described in this study can be used in related research to classify, identify, and predict the hosts of influenza viruses that are currently susceptible or likely to be infected in the future.
BackgroundPolyomaviruses (PyVs) have a wide range of hosts, from humans to fish, and their effects on hosts vary. The differences in the infection characteristics of PyV with respect to the host are assumed to be influenced by the biochemical function of the LT-Ag protein, which is related to the cytopathic effect and tumorigenesis mechanism via interaction with the host protein.MethodsWe carried out a comparative analysis of codon usage patterns of large T-antigens (LT-Ags) of PyVs isolated from various host species and their functional domains and sequence motifs. Parity rule 2 (PR2) and neutrality analysis were applied to evaluate the effects of mutation and selection pressure on codon usage bias. To investigate evolutionary relationships among PyVs, we carried out a phylogenetic analysis, and a correspondence analysis of relative synonymous codon usage (RSCU) values was performed.ResultsNucleotide composition analysis using LT-Ag gene sequences showed that the GC and GC3 values of avian PyVs were higher than those of mammalian PyVs. The effective number of codon (ENC) analysis showed host-specific ENC distribution characteristics in both the LT-Ag gene and the coding sequences of its domain regions. In the avian and fish PyVs, the codon diversity was significant, whereas the mammalian PyVs tended to exhibit conservative and host-specific evolution of codon usage bias. The results of our PR2 and neutrality analysis revealed mutation bias or highly variable GC contents by showing a narrow GC12 distribution and wide GC3 distribution in all sequences. Furthermore, the calculated RSCU values revealed differences in the codon usage preference of the LT-AG gene according to the host group. A similar tendency was observed in the two functional domains used in the analysis.ConclusionsOur study showed that specific domains or sequence motifs of various PyV LT-Ags have evolved so that each virus protein interacts with host cell targets. They have also adapted to thrive in specific host species and cell types. Functional domains of LT-Ag, which are known to interact with host proteins involved in cell proliferation and gene expression regulation, may provide important information, as they are significantly related to the host specificity of PyVs.
Rift Valley fever virus (RVFV) is a vector-borne pathogen and is the most widely known virus in the genus Phlebovirus. Since it was first reported, RVFV has spread to western Africa, Egypt and Madagascar from its traditional endemic region, and infections continue to occur in new areas. In this study, we analyzed genomic patterns according to the infection properties of RVFV. Among the four segments of RVFV, the nucleotide composition, overall GC content and the difference of GC composition in the third position of the codons (%GC3) between groups were the largest in the S (NP) segment, showing that more diverse codons were used than in other segments. Furthermore, the results of CAI analysis of the S (NP) segment showed that viruses isolated from regions where no previous infections had been reported had the highest values, indicating greater adaptability to human hosts compared with other viruses. This result suggests that mutations in the S (NP) segment co-evolve with the infected hosts and may lead to expansion of the geographic range. The distinctive codon usage patterns observed in specific genomic regions of a group with similar infection properties may be related to the increasing likelihood of RVFV infections in new areas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.