BackgroundProtein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached. Since the predictions of SSP methods are applied as input to higher-level structure prediction pipelines, even small errors may have large perturbations in final models. Previous works relied on cross validation as an estimate of classifier accuracy. However, training on large numbers of protein chains compromises the classifier ability to generalize to new sequences. This prompts a novel approach to training and an investigation into the possible structural factors that lead to poor predictions.Here, a small group of 55 proteins termed the compact model is selected from the CB513 dataset using a heuristics-based approach. In a prior work, all sequences were represented as probability matrices of residues adopting each of Helix, Sheet and Coil states, based on energy calculations using the C-Alpha, C-Beta, Side-chain (CABS) algorithm. The functional relationship between the conformational energies computed with CABS force-field and residue states is approximated using a classifier termed the Fully Complex-valued Relaxation Network (FCRN). The FCRN is trained with the compact model proteins.ResultsThe performance of the compact model is compared with traditional cross-validated accuracies and blind-tested on a dataset of G Switch proteins, obtaining accuracies of ∼81 %. The model demonstrates better results when compared to several techniques in the literature. A comparative case study of the worst performing chain identifies hydrogen bond contacts that lead to Coil ⇔ Sheet misclassifications. Overall, mispredicted Coil residues have a higher propensity to participate in backbone hydrogen bonding than correctly predicted Coils.ConclusionsThe implications of these findings are: (i) the choice of training proteins is important in preserving the generalization of a classifier to predict new sequences accurately and (ii) SSP techniques sensitive in distinguishing between backbone hydrogen bonding and side-chain or water-mediated hydrogen bonding might be needed in the reduction of Coil ⇔ Sheet misclassifications.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1209-0) contains supplementary material, which is available to authorized users.
Background: Influenza reassortment, a mechanism where influenza viruses exchange their RNA segments by co-infecting a single cell, has been implicated in several major pandemics since 19th century. Owing to the significant impact on public health and social stability, great attention has been received on the identification of influenza reassortment. Methods: We proposed a novel computational method named HopPER (Host-prediction-based Probability Estimation of Reassortment), that sturdily estimates reassortment probabilities through host tropism prediction using 147 new features generated from seven physicochemical properties of amino acids. We conducted the experiments on a range of real and synthetic datasets and compared HopPER with several state-of-the-art methods. Results: It is shown that 280 out of 318 candidate reassortants have been successfully identified. Additionally, not only can HopPER be applied to complete genomes but its effectiveness on incomplete genomes is also demonstrated. The analysis of evolutionary success of avian, human and swine viruses generated through reassortment across different years using HopPER further revealed the reassortment history of the influenza viruses. Conclusions: Our study presents a novel method for the prediction of influenza reassortment. We hope this method could facilitate rapid reassortment detection and provide novel insights into the evolutionary patterns of influenza viruses.
There exist several databases that provide virus-host protein interactions. While most provide curated records of interacting virus-host protein pairs, information on the strain-specific virulence factors or protein domains involved, is lacking. Some databases offer incomplete coverage of Influenza strains because of the need to sift through vast amounts of literature (including those of major viruses including HIV and Dengue, besides others). None have offered complete, strain specific protein-protein interaction records for the Influenza A group of viruses. In this paper, we present a comprehensive network of predicted domain-domain interaction(s) (DDI) between Influenza A virus (IAV) and mouse host proteins, that will allow the systematic study of disease factors by taking the virulence information (lethal dose) into account. From a previously published dataset of lethal dose studies of IAV infection in mice, we constructed an interacting domain network of mouse and viral protein domains as nodes with weighted edges. The edges were scored with the Domain Interaction Statistical Potential (DISPOT) to indicate putative DDI. The virulence network can be easily navigated via a web browser, with the associated virulence information (LD50 values) prominently displayed. The network will aid Influenza A disease modeling by providing strain-specific virulence levels with interacting protein domains. It can possibly contribute to computational methods for uncovering Influenza infection mechanisms mediated through protein domain interactions between viral and host proteins.
Epitope residues located on viral surface proteins are of immense interest in immunology and related applications such as vaccine development, disease diagnosis and drug design. Most tools rely on sequence based statistical comparisons, such as information entropy of residue positions in aligned columns to infer location and properties of epitope sites. To facilitate cross-structural comparisons of epitopes on viral surface proteins, a python-based extraction tool implemented with Jupyter notebook is presented (Jupytope). Given a viral antigen structure of interest, a list of known epitope sites and a reference structure, the corresponding epitope structural properties can quickly be obtained. The tool integrates biopython modules for commonly used software such as NACCESS, DSSP as well as residue depth and outputs a list of structure derived properties such as dihedral angles, solvent accessibility, residue depth and secondary structure that can be saved in several convenient data formats. To ensure correct spatial alignment, Jupytope takes a list of given epitope sites and their corresponding reference structure and aligns them before extracting the desired properties. Examples are demonstrated for epitopes of Influenza and SARS-CoV2 viral strains. The extracted properties assist detection of two Influenza subtypes and show potential in distinguishing between four major clades of SARS-CoV2, as compared with randomized labels. The tool will facilitate analytical and predictive works on viral epitopes through the extracted structural information.Key MessagesJupytope combines existing 3D-structural software to extract the properties of viral epitopes into a convenient text or csv file formatThe structural properties serve as parameters or features that quantitatively capture viral epitopesAssociation of structural properties to viral subtypes (for Influenza) or clades (SARS-CoV2) is demonstrated with a simple XGBoost modelStructure datasets mapped to SARS-CoV2 WHO clades and Pango lineages, as well as chain annotations are available for download
CIL). Their technical support was of great assistance. I thank Mr. Sing Yau in particular for assisting in matters such as card access and problems with my network account, amongst many other things.Finally, I thank my family members and friends for their kind words of encouragement and motivation throughout my studies.The findings of the thesis can be extended in several ways. Firstly, for the use of PSSM based encoding, the proposed models may be improved by incorporating other sequence and structure-based properties of interest, such as the amino acid composition, accessible surface areas of residues and so forth. Secondly, in the case of MP structure prediction, the current three or four state residue classification system could be extended to cover more features of interest such as kinks and re-entrant helices, that remain un-explored. Lastly, the heuristics based procedure to obtain the compact model can be extended in a systematic way by applying automated sample selection strategies, such that the best learning model given any training set is automatically selected.
Epitope residues located on viral surface proteins are of immense interest in immunology and related applications such as vaccine development, disease diagnosis and drug design. Most tools rely on sequence-based statistical comparisons, such as information entropy of residue positions in aligned columns to infer location and properties of epitope sites. To facilitate cross-structural comparisons of epitopes on viral surface proteins, a python-based extraction tool implemented with Jupyter notebook is presented (Jupytope). Given a viral antigen structure of interest, a list of known epitope sites and a reference structure, the corresponding epitope structural properties can quickly be obtained. The tool integrates biopython modules for commonly used software such as NACCESS, DSSP as well as residue depth and outputs a list of structure-derived properties such as dihedral angles, solvent accessibility, residue depth and secondary structure that can be saved in several convenient data formats. To ensure correct spatial alignment, Jupytope takes a list of given epitope sites and their corresponding reference structure and aligns them before extracting the desired properties. Examples are demonstrated for epitopes of Influenza and severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) viral strains. The extracted properties assist detection of two Influenza subtypes and show potential in distinguishing between four major clades of SARS-CoV2, as compared with randomized labels. The tool will facilitate analytical and predictive works on viral epitopes through the extracted structural information. Jupytope and extracted datasets are available at https://github.com/shamimarashid/Jupytope.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.