A complete map of the Arabidopsis (Arabidopsis thaliana) proteome is clearly a major goal for the plant research community in terms of determining the function and regulation of each encoded protein. Developing genome-wide prediction tools such as for localizing gene products at the subcellular level will substantially advance Arabidopsis gene annotation. To this end, we performed a comprehensive study in Arabidopsis and created an integrative support vector machine-based localization predictor called AtSubP (for Arabidopsis subcellular localization predictor) that is based on the combinatorial presence of diverse protein features, such as its amino acid composition, sequence-order effects, terminal information, Position-Specific Scoring Matrix, and similarity search-based Position-Specific Iterated-Basic Local Alignment Search Tool information. When used to predict seven subcellular compartments through a 5-fold cross-validation test, our hybrid-based best classifier achieved an overall sensitivity of 91% with high-confidence precision and Matthews correlation coefficient values of 90.9% and 0.89, respectively. Benchmarking AtSubP on two independent data sets, one from Swiss-Prot and another containing green fluorescent protein-and mass spectrometry-determined proteins, showed a significant improvement in the prediction accuracy of species-specific AtSubP over some widely used "general" tools such as TargetP, LOCtree, PA-SUB, MultiLoc, WoLF PSORT, Plant-PLoc, and our newly created All-Plant method.
Transporters move hydrophilic substrates across hydrophobic biological membranes and play key roles in plant nutrition, metabolism, and signaling and, consequently, in plant growth, development, and responses to the environment. To initiate and support systematic characterization of transporters in the model legume Medicago truncatula, we identified 3,830 transporters and classified 2,673 of these into 113 families and 146 subfamilies. Analysis of gene expression data for 2,611 of these transporters identified 129 that are expressed in an organ-specific manner, including 50 that are nodule specific and 36 specific to mycorrhizal roots. Further analysis uncovered 196 transporters that are induced at least 5-fold during nodule development and 44 in roots during arbuscular mycorrhizal symbiosis. Among the nodule-and mycorrhiza-induced transporter genes are many candidates for known transport activities in these beneficial symbioses. The data presented here are a unique resource for the selection and functional characterization of legume transporters.
The subcellular localization of proteins is very important for characterizing its function in a cell. Accurate prediction of the subcellular locations in computational paradigm has been an active area of interest. Most of the work has been focused on single localization prediction. Only few studies have discussed the multi-target localization, but have not achieved good accuracy so far; in plant sciences, very limited work has been done. Here we report the development of a novel tool Plant-mSubP, which is based on integrated machine learning approaches to efficiently predict the subcellular localizations in plant proteomes. The proposed approach predicts with high accuracy 11 single localizations and three dual locations of plant cell. Several hybrid features based on composition and physicochemical properties of a protein such as amino acid composition, pseudo amino acid composition, auto-correlation descriptors, quasi-sequence-order descriptors and hybrid features are used to represent the protein. The performance of the proposed method has been assessed through a training set as well as an independent test set. Using the hybrid feature of the pseudo amino acid composition, N-Center-C terminal amino acid composition and the dipeptide composition (PseAAC-NCC-DIPEP), an overall accuracy of 81.97 %, 84.75 % and 87.88 % is achieved on the training data set of proteins containing the single-label, single- and dual-label combined, and dual-label proteins, respectively. When tested on the independent data, an accuracy of 64.36 %, 64.84 % and 81.08 % is achieved on the single-label, single- and dual-label, and dual-label proteins, respectively. The prediction models have been implemented on a web server available at http://bioinfo.usu.edu/Plant-mSubP/. The results indicate that the proposed approach is comparable to the existing methods in single localization prediction and outperforms all other existing tools when compared for dual-label proteins. The prediction tool will be a useful resource for better annotation of various plant proteomes.
Background: Diverse modeling approaches viz. neural networks and multiple regression have been followed to date for disease prediction in plant populations. However, due to their inability to predict value of unknown data points and longer training times, there is need for exploiting new prediction softwares for better understanding of plant-pathogen-environment relationships. Further, there is no online tool available which can help the plant researchers or farmers in timely application of control measures. This paper introduces a new prediction approach based on support vector machines for developing weather-based prediction models of plant diseases.
BackgroundEvery year pathogenic organisms cause billions of dollars' worth damage to crops and livestock. In agriculture, study of plant-microbe interactions is demanding a special attention to develop management strategies for the destructive pathogen induced diseases that cause huge crop losses every year worldwide. Pseudomonas syringae is a major bacterial leaf pathogen that causes diseases in a wide range of plant species. Among its various strains, pathovar tomato strain DC3000 (PstDC3000) is asserted to infect the plant host Arabidopsis thaliana and thus, has been accepted as a model system for experimental characterization of the molecular dynamics of plant-pathogen interactions. Protein-protein interactions (PPIs) play a critical role in initiating pathogenesis and maintaining infection. Understanding the PPI network between a host and pathogen is a critical step for studying the molecular basis of pathogenesis. The experimental study of PPIs at a large scale is very scarce and also the high throughput experimental results show high false positive rate. Hence, there is a need for developing efficient computational models to predict the interaction between host and pathogen in a genome scale, and find novel candidate effectors and/or their targets.ResultsIn this study, we used two computational approaches, the interolog and the domain-based to predict the interactions between Arabidopsis and PstDC3000 in genome scale. The interolog method relies on protein sequence similarity to conduct the PPI prediction. A Pseudomonas protein and an Arabidopsis protein are predicted to interact with each other if an experimentally verified interaction exists between their respective homologous proteins in another organism. The domain-based method uses domain interaction information, which is derived from known protein 3D structures, to infer the potential PPIs. If a Pseudomonas and an Arabidopsis protein contain an interacting domain pair, one can expect the two proteins to interact with each other. The interolog-based method predicts ~0.79M PPIs involving around 7700 Arabidopsis and 1068 Pseudomonas proteins in the full genome. The domain-based method predicts 85650 PPIs comprising 11432 Arabidopsis and 887 Pseudomonas proteins. Further, around 11000 PPIs have been identified as interacting from both the methods as a consensus.ConclusionThe present work predicts the protein-protein interaction network between Arabidopsis thaliana and Pseudomonas syringae pv. tomato DC3000 in a genome wide scale with a high confidence. Although the predicted PPIs may contain some false positives, the computational methods provide reasonable amount of interactions which can be further validated by high throughput experiments. This can be a useful resource to the plant community to characterize the host-pathogen interaction in Arabidopsis and Pseudomonas system. Further, these prediction models can be applied to the agriculturally relevant crops.
Tris(1,3-dichloro-2-propyl)phosphate (TDCIPP) is a high-production volume organophosphate-based plasticizer and flame retardant widely used within the United States. Using zebrafish as a model, the objectives of this study were to determine whether (1) TDCIPP inhibits DNA methyltransferase (DNMT) within embryonic nuclear extracts; (2) uptake of TDCIPP from 0.75 h postfertilization (hpf, 2-cell) to 2 hpf (64-cell) or 6 hpf (shield stage) leads to impacts on the early embryonic DNA methylome; and (3) TDCIPP-induced impacts on cytosine methylation are localized to CpG islands within intergenic regions. Within this study, 5-azacytidine (5-azaC, a DNMT inhibitor) was used as a positive control. Although 5-azaC significantly inhibited zebrafish DNMT, TDCIPP did not affect DNMT activity in vitro at concentrations as high as 500 μM. However, rapid embryonic uptake of 5-azaC and TDCIPP from 0.75 to 2 hpf resulted in chemical- and chromosome-specific alterations in cytosine methylation at 2 hpf. Moreover, TDCIPP exposure predominantly resulted in hypomethylation of positions outside of CpG islands and within intragenic (exon) regions of the zebrafish genome. Overall, these findings provide the foundation for monitoring DNA methylation dynamics within zebrafish as well as identifying potential associations among TDCIPP exposure, adverse health outcomes, and DNA methylation status within human populations.
BackgroundDicer, an RNase III enzyme, plays a vital role in the processing of pre-miRNAs for generating the miRNAs. The structural and sequence features on pre-miRNA which can facilitate position and efficiency of cleavage are not well known. A precise cleavage by Dicer is crucial because an inaccurate processing can produce miRNA with different seed regions which can alter the repertoire of target genes.ResultsIn this study, a novel method has been developed to predict Dicer cleavage sites on pre-miRNAs using Support Vector Machine. We used the dataset of experimentally validated human miRNA hairpins from miRBase, and extracted fourteen nucleotides around Dicer cleavage sites. We developed number of models using various types of features and achieved maximum accuracy of 66% using binary profile of nucleotide sequence taken from 5p arm of hairpin. The prediction performance of Dicer cleavage site improved significantly from 66% to 86% when we integrated secondary structure information. This indicates that secondary structure plays an important role in the selection of cleavage site. All models were trained and tested on 555 experimentally validated cleavage sites and evaluated using 5-fold cross validation technique. In addition, the performance was also evaluated on an independent testing dataset that achieved an accuracy of ~82%.ConclusionBased on this study, we developed a webserver PHDcleav (http://www.imtech.res.in/raghava/phdcleav/) to predict Dicer cleavage sites in pre-miRNA. This tool can be used to investigate functional consequences of genetic variations/SNPs in miRNA on Dicer cleavage site, and gene silencing. Moreover, it would also be useful in the discovery of miRNAs in human genome and design of Dicer specific pre-miRNAs for potent gene silencing.
The attainment of complete map-based sequence for rice (Oryza sativa) is clearly a major milestone for the research community. Identifying the localization of encoded proteins is the key to understanding their functional characteristics and facilitating their purification. Our proposed method, RSLpred, is an effort in this direction for genome-scale subcellular prediction of encoded rice proteins. First, the support vector machine (SVM)-based modules have been developed using traditional amino acid-, dipeptide- (i+1) and four parts-amino acid composition and achieved an overall accuracy of 81.43, 80.88 and 81.10%, respectively. Secondly, a similarity search-based module has been developed using position-specific iterated-basic local alignment search tool and achieved 68.35% accuracy. Another module developed using evolutionary information of a protein sequence extracted from position-specific scoring matrix achieved an accuracy of 87.10%. In this study, a large number of modules have been developed using various encoding schemes like higher-order dipeptide composition, N- and C-terminal, splitted amino acid composition and the hybrid information. In order to benchmark RSLpred, it was tested on an independent set of rice proteins where it outperformed widely used prediction methods such as TargetP, Wolf-PSORT, PA-SUB, Plant-Ploc and ESLpred. To assist the plant research community, an online web tool 'RSLpred' has been developed for subcellular prediction of query rice proteins, which is freely accessible at http://www.imtech.res.in/raghava/rslpred.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite Inc. All rights reserved.
Made with 💙 for researchers