BackgroundIdentification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species.ResultsA Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi.ConclusionsAn online prediction server “funbarRF” is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF (https://cran.r-project.org/web/packages/funbarRF/) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.
Heat shock proteins (HSPs) play a pivotal role in cell growth and variability. Since conventional approaches are expensive and voluminous protein sequence information is available in the post-genomic era, development of an automated and accurate computational tool is highly desirable for prediction of HSPs, their families and sub-types. Thus, we propose a computational approach for reliable prediction of all these components in a single framework and with higher accuracy as well. The proposed approach achieved an overall accuracy of ~84% in predicting HSPs, ~97% in predicting six different families of HSPs, and ~94% in predicting four types of DnaJ proteins, with bench mark datasets. The developed approach also achieved higher accuracy as compared to most of the existing approaches. For easy prediction of HSPs by experimental scientists, a user friendly web server ir-HSP is made freely accessible at http://cabgrid.res.in:8080/ir-hsp. The ir-HSP was further evaluated for proteome-wide identification of HSPs by using proteome datasets of eight different species, and ~50% of the predicted HSPs in each species were found to be annotated with InterPro HSP families/domains. Thus, the developed computational method is expected to supplement the currently available approaches for prediction of HSPs, to the extent of their families and sub-types.
BackgroundNematodes are the most numerous animals in the soil. Insect parasitic nematodes of the genus Heterorhabditis are capable of selectively seeking, infecting and killing their insect-hosts in the soil. The infective juvenile (IJ) stage of the Heterorhabditis nematodes is analogous to Caenorhabditis elegans dauer juvenile stage, which remains in ‘arrested development’ till it finds and infects a new insect-host in the soil. H. indica is the most prevalent species of Heterorhabditis in India. To understand the genes and molecular processes that govern the biology of the IJ stage, and to create a resource to facilitate functional genomics and genetic exploration, we sequenced the transcriptome of H. indica IJs.ResultsThe de-novo sequence assembly using Velvet-Oases pipeline resulted in 13,593 unique transcripts at N50 of 1,371 bp, of which 53 % were annotated by blastx. H. indica transcripts showed higher orthology with parasitic nematodes as compared to free living nematodes. In-silico expression analysis showed 30 % of transcripts expressing with ≥100 FPKM value. All the four canonical dauer formation pathways like cGMP-PKG, insulin, dafachronic acid and TGF-β were active in the IJ stage. Several other signaling pathways were highly represented in the transcriptome. Twenty-four orthologs of C. elegans RNAi pathway effector genes were discovered in H. indica, including nrde-3 that is reported for the first time in any of the parasitic nematodes. An ortholog of C. elegans tol-1 was also identified. Further, 272 kinases belonging to 137 groups, and several previously unidentified members of important gene classes were identified.ConclusionsWe generated high-quality transcriptome sequence data from H. indica IJs for the first time. The transcripts showed high similarity with the parasitic nematodes, M. hapla, and A. suum as opposed to C. elegans, a species to which H. indica is more closely related. The high representation of transcripts from several signaling pathways in the IJs indicates that despite being a developmentally arrested stage; IJs are a hotbed of signaling and are actively interacting with their environment.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2510-z) contains supplementary material, which is available to authorized users.
As inorganic nitrogen compounds are essential for basic building blocks of life (e.g., nucleotides and amino acids), the role of biological nitrogen-fixation (BNF) is indispensible. All nitrogen fixing microbes rely on the same nitrogenase enzyme for nitrogen reduction, which is in fact an enzyme complex consists of as many as 20 genes. However, the occurrence of six genes viz., nifB, nifD, nifE, nifH, nifK, and nifN has been proposed to be essential for a functional nitrogenase enzyme. Therefore, identification of these genes is important to understand the mechanism of BNF as well as to explore the possibilities for improving BNF from agricultural sustainability point of view. Further, though the computational tools are available for the annotation and phylogenetic analysis of nifH gene sequences alone, to the best of our knowledge no tool is available for the computational prediction of the above mentioned six categories of nitrogen-fixation (nif) genes or proteins. Thus, we proposed an approach, which is first of its kind for the computational identification of nif proteins encoded by the six categories of nif genes. Sequence-derived features were employed to map the input sequences into vectors of numeric observations that were subsequently fed to the support vector machine as input. Two types of classifier were constructed: (i) a binary classifier for classification of nif and non-nitrogen-fixation (non-nif) proteins, and (ii) a multi-class classifier for classification of six categories of nif proteins. Higher accuracies were observed for the combination of composition-transition-distribution (CTD) feature set and radial kernel, as compared to the other feature-kernel combinations. The overall accuracies were observed >90% in both binary and multi-class classifications. The developed approach further achieved >92% accuracy, while evaluated with blind (independent) test datasets. The developed approach also produced higher accuracy in identifying nif proteins, while evaluated using proteome-wide datasets of several species. Furthermore, we established a prediction server nifPred (http://webapp.cabgrid.res.in/nifPred) to assist the scientific community for proteome-wide identification of six categories of nif proteins. Besides, the source code of nifPred is also available at https://github.com/PrabinaMeher/nifPred. The developed web server is expected to supplement the transcriptional profiling and comparative genomics studies for the identification and functional annotation of genes related to BNF.
Nematodes are responsible for causing severe diseases in plants, humans and other animals. Infection is associated with the release of Excretory/Secretory (ES) proteins into host cytoplasm and interference with the host immune system which make them attractive targets for therapeutic use. The identification of ES proteins through bioinformatics approaches is cost- and time-effective and could be used for screening of potential targets for parasitic diseases for further experimental studies. Here, we identified and functionally annotated 93,949 ES proteins, in the genome of 73 nematodes using integration of various bioinformatics tools. 30.6% of ES proteins were found to be supported at RNA level. The predicted ES proteins, annotated by Gene Ontology terms, domains, metabolic pathways, proteases and enzyme class analysis were enriched in molecular functions of proteases, protease inhibitors, c-type lectin and hydrolases which are strongly associated with typical functions of ES proteins. We identified a total of 452 ES proteins from human and plant parasitic nematodes, homologues to DrugBank-approved targets and C. elegans RNA interference phenotype genes which could represent potential targets for parasite control and provide valuable resource for further experimental studies to understand host-pathogen interactions.
Herbicide resistance (HR) is a major concern for the agricultural producers as well as environmentalists. Resistance to commonly used herbicides are conferred due to mutation(s) in the genes encoding herbicide target sites/proteins (GETS). Identification of these genes through wet-lab experiments is time consuming and expensive. Thus, a supervised learning-based computational model has been proposed in this study, which is first of its kind for the prediction of seven classes of GETS. The cDNA sequences of the genes were initially transformed into numeric features based on the k-mer compositions and then supplied as input to the support vector machine. In the proposed SVM-based model, the prediction occurs in two stages, where a binary classifier in the first stage discriminates the genes involved in conferring the resistance to herbicides from other genes, followed by a multi-class classifier in the second stage that categorizes the predicted herbicide resistant genes in the first stage into any one of the seven resistant classes. Overall classification accuracies were observed to be ~89% and >97% for binary and multi-class classifications respectively. The proposed model confirmed higher accuracy than the homology-based algorithms viz., BLAST and Hidden Markov Model. Besides, the developed computational model achieved ~87% accuracy, while tested with an independent dataset. An online prediction server HRGPred (http://cabgrid.res.in:8080/hrgpred) has also been established to facilitate the prediction of GETS by the scientific community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.