Gavin C. Cawley scite author profile

Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, ~70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences

show abstract

Downscaling heavy precipitation over the United Kingdom: a comparison of dynamical and statistical methods and their future scenarios

Haylock

Cawley

Harpham

et al. 2006

Intl Journal of Climatology

313

253

View full text Add to dashboard Cite

Six statistical and two dynamical downscaling models were compared with regard to their ability to downscale seven seasonal indices of heavy precipitation for two station networks in northwest and southeast England. The skill among the eight downscaling models was high for those indices and seasons that had greater spatial coherence. Generally, winter showed the highest downscaling skill and summer the lowest. The rainfall indices that were indicative of rainfall occurrence were better modelled than those indicative of intensity. Models based on non-linear artificial neural networks were found to be the best at modelling the inter-annual variability of the indices; however, their strong negative biases implied a tendency to underestimate extremes. A novel approach used in one of the neural network models to output the rainfall probability and the gamma distribution scale and shape parameters for each day meant that resampling methods could be used to circumvent the underestimation of extremes. Six of the models were applied to the Hadley Centre global circulation model HadAM3P forced by emissions according to two SRES scenarios. This revealed that the inter-model differences between the future changes in the downscaled precipitation indices were at least as large as the differences between the emission scenarios for a single model. This implies caution when interpreting the output from a single model or a single type of model (e.g. regional climate models) and the advantage of including as many different types of downscaling models, global models and emission scenarios as possible when developing climate-change projections at the local scale.

show abstract

Efficient leave-one-out cross-validation of kernel fisher discriminant classifiers

Cawley

Talbot

2003

Pattern Recognition

382

239

View full text Add to dashboard Cite

Fast exact leave-one-out cross-validation of sparse least-squares support vector machines

2004

View full text Add to dashboard Cite

Gene selection in cancer classification using sparse logistic regression with Bayesian regularization

2006

View full text Add to dashboard Cite

show abstract

Design and analysis of the WCCI 2010 active learning challenge

Guyon

Cawley

Dror

et al. 2010

View full text Add to dashboard Cite

We organized a data mining challenge on "active learning" for IJCNN/WCCI 2010, addressing machine learning problems where labeling data is expensive, but large amounts of unlabeled data are available at low cost. Examples include handwriting and speech recognition, document classification, vision tasks, drug design using recombinant molecules and protein engineering. Such problems might be tackled from different angles: learning from unlabeled data or active learning. In the former case, the algorithms must satisfy themselves with the limited amount of labeled data and capitalize on the unlabeled data with semi-supervised learning methods. Several challenges have addressed this problem in the past. In the latter case, the algorithms may place a limited number of queries to get new sample labels. The goal in that case is to optimize the queries and the problem is referred to as active learning. While the problem of active learning is of great importance, organizing a challenge in that area is non trivial. This is the problem we have addressed, and we describe our approach in this paper. The "active learning" challenge is part of the WCCI 2010 competition program (http://www.wcci2010. org/competition-program). The website of the challenge remains open for submission of new methods beyond the termination of the challenge as a resource for students and researchers (http://clopinet.com/al).

show abstract

Design of the 2015 ChaLearn AutoML challenge

Guyon

Bennett²,

Cawley

et al. 2015

View full text Add to dashboard Cite

ChaLearn is organizing the Automatic Machine Learning (AutoML) contest for IJCNN 2015, which challenges participants to solve classification and regression problems without any human intervention. Participants' code is automatically run on the contest servers to train and test learning machines. However, there is no obligation to submit code; half of the prizes can be won by submitting prediction results only. Datasets of progressively increasing difficulty are introduced throughout the six rounds of the challenge. (Participants can enter the competition in any round.) The rounds alternate phases in which learners are tested on datasets participants have not seen, and phases in which participants have limited time to tweak their algorithms on those datasets to improve performance. This challenge will push the state of the art in fully automatic machine learning on a wide range of real-world problems. The platform will remain available beyond the termination of the challenge

show abstract

Nested cross-validation when selecting classifiers is overzealous for most practical applications

Wainer

Cawley

2021

Expert Systems with Applications

106

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gavin C. Cawley

Establishing glucose- and ABA-regulated transcription networks in Arabidopsis by microarray analysis and promoter classification using a Relevance Vector Machine

Downscaling heavy precipitation over the United Kingdom: a comparison of dynamical and statistical methods and their future scenarios

Efficient leave-one-out cross-validation of kernel fisher discriminant classifiers

Fast exact leave-one-out cross-validation of sparse least-squares support vector machines

Gene selection in cancer classification using sparse logistic regression with Bayesian regularization

Design and analysis of the WCCI 2010 active learning challenge

Design of the 2015 ChaLearn AutoML challenge

Nested cross-validation when selecting classifiers is overzealous for most practical applications

Contact Info

Product

Resources

About