BackgroundDue to the functional importance of intrinsically disordered proteins or protein regions, prediction of intrinsic protein disorder from amino acid sequence has become an area of active research as witnessed in the 6th experiment on Critical Assessment of Techniques for Protein Structure Prediction (CASP6). Since the initial work by Romero et al. (Identifying disordered regions in proteins from amino acid sequences, IEEE Int. Conf. Neural Netw., 1997), our group has developed several predictors optimized for long disordered regions (>30 residues) with prediction accuracy exceeding 85%. However, these predictors are less successful on short disordered regions (≤30 residues). A probable cause is a length-dependent amino acid compositions and sequence properties of disordered regions.ResultsWe proposed two new predictor models, VSL2-M1 and VSL2-M2, to address this length-dependency problem in prediction of intrinsic protein disorder. These two predictors are similar to the original VSL1 predictor used in the CASP6 experiment. In both models, two specialized predictors were first built and optimized for short (≤30 residues) and long disordered regions (>30 residues), respectively. A meta predictor was then trained to integrate the specialized predictors into the final predictor model. As the 10-fold cross-validation results showed, the VSL2 predictors achieved well-balanced prediction accuracies of 81% on both short and long disordered regions. Comparisons over the VSL2 training dataset via 10-fold cross-validation and a blind-test set of unrelated recent PDB chains indicated that VSL2 predictors were significantly more accurate than several existing predictors of intrinsic protein disorder.ConclusionThe VSL2 predictors are applicable to disordered regions of any length and can accurately identify the short disordered regions that are often misclassified by our previous disorder predictors. The success of the VSL2 predictors further confirmed the previously observed differences in amino acid compositions and sequence properties between short and long disordered regions, and justified our approaches for modelling short and long disordered regions separately. The VSL2 predictors are freely accessible for non-commercial use at
During the past few years we have investigated methods to improve predictors of intrinsically disordered regions longer than 30 consecutive residues. Experimental evidence, however, showed that these predictors were less successful on short disordered regions, as observed two years ago during the fifth Critical Assessment of Techniques for Protein Structure Prediction (CASP5). To address this shortcoming, we developed a two-level model called VSL1 (CASP6 id: 193-1). At the first level, VSL1 consists of two specialized predictors, one of which was optimized for long disordered regions (>30 residues) and the other for short disordered regions (< or =30 residues). At the second level, a meta-predictor was built to assign weights for combining the two first-level predictors. As the results of the CASP6 experiment showed, this new predictor has achieved the highest accuracy yet and significantly improved performance on short disordered regions, while maintaining high performance on long disordered regions.
Protein existing as an ensemble of structures, called intrinsically disordered, has been shown to be res ponsible for a wide variety of biological functions and to be common in nature. Here we focus on improving sequence-based predictions of long (> 30 amino acid residues) regions lacking specific 3-D structure by means of four new neural -network-based Predictors Of Natural Disordered Regions (PONDRs): VL3, VL3H, VL3P, and VL3E. PONDR VL3 used several features from a previously introduced PONDR VL2, but benefitted from optimized predictor models and a slightly larger (152 versus 145) set of disordered proteins that were cleaned of mislabeling errors found in the smaller set. PONDR VL3H utilized homologues of the disordered proteins in the training stage, while PONDR VL3P used attributes derived from sequence profiles obtained by PSI-BLAST searches. The measure of accuracy was the average between accuracies on disordered and ordered protein regions. By this measure, the 30-fold cross-validation accuracies of VL3, VL3H, and VL3P were, respectively, 83.6±1.4%, 85.3±1.4%, and 85.2±1.5%. By combining VL3H and VL3P, t he resulting PONDR VL3E achieved an accuracy of 86.7±1.4%. This is a significant improvement over our previous PONDRs VLXT (71.6±1.3%) and VL2 (80.9±1.4% ). The new disorder predictors with the corresponding datasets are freely accessible through the web server at www.ist.temple.edu/disprot.
Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77% to 91% for the ordered regions and from 56% to 78% for the disordered segments. The average of the order and disorder predictions ranged from 73% to 77%. The prediction of disorder in the shorter segments was poor, from 25% to 66% correct, while the prediction of disorder in the longer segments was better, from 75% to 95% correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61-residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence. Proteins 2003;53:566 -572.
SummaryAn integral part of global environment change is an increase in the atmospheric concentration of CO2 ([CO2]) [1]. Increased [CO2] reduces leaf stomatal apertures and density of stomata that plays out as reductions in evapotranspiration [2–4]. Surprisingly, given the importance of transpiration to the control of terrestrial water fluxes [5] and plant nutrient acquisition [6], we know comparatively little about the molecular components involved in the intracellular signaling pathways by which [CO2] controls stomatal development and function [7]. Here, we report that elevated [CO2]-induced closure and reductions in stomatal density require the generation of reactive oxygen species (ROS), thereby adding a new common element to these signaling pathways. We also show that the PYR/RCAR family of ABA receptors [8, 9] and ABA itself are required in both responses. Using genetic approaches, we show that ABA in guard cells or their precursors is sufficient to mediate the [CO2]-induced stomatal density response. Taken together, our results suggest that stomatal responses to increased [CO2] operate through the intermediacy of ABA. In the case of [CO2]-induced reductions in stomatal aperture, this occurs by accessing the guard cell ABA signaling pathway. In both [CO2]-mediated responses, our data are consistent with a mechanism in which ABA increases the sensitivity of the system to [CO2] but could also be explained by requirement for a CO2-induced increase in ABA biosynthesis specifically in the guard cell lineage. Furthermore, the dependency of stomatal [CO2] signaling on ABA suggests that the ABA pathway is, in evolutionary terms, likely to be ancestral.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.