Alexander Statnikov scite author profile

BackgroundCancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain.ResultsIn the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms.ConclusionWe found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.

show abstract

The Notch/Hes1 Pathway Sustains NF-κB Activation through CYLD Repression in T Cell Leukemia

Espinosa

et al. 2010

View full text Add to dashboard Cite

SUMMARY It was previously shown that the NF-κB pathway is downstream of oncogenic Notch1 in T cell acute lymphoblastic leukemia (T-ALL). Here we visualize Notch-induced NF-κB activation using both human T-ALL cell lines and animal models. We demonstrate that Hes1, a canonical Notch target and transcriptional repressor, is responsible for sustaining IKK activation in T-ALL. Hes1 exerts its effects by repressing the deubiquitinase CYLD, a negative IKK complex regulator. CYLD expression was found to be significantly suppressed in primary T-ALL. Finally, we demonstrate that IKK inhibition is a promising option for the targeted therapy of T-ALL as specific suppression of IKK expression and function affected both the survival of human T-ALL cells and the maintenance of the disease in vivo.

show abstract

Time and sample efficient discovery of Markov blankets and direct causal relations

2003

View full text Add to dashboard Cite

Data Mining with Bayesian Network learning has two important characteristics: under broad conditions learned edges between variables correspond to causal influences, and second, for every variable T in the network a special subset (Markov Blanket) identifiable by the network is the minimal variable set required to predict T . However, all known algorithms learning a complete BN do not scale up beyond a few hundred variables. On the other hand, all known sound algorithms learning a local region of the network require an exponential number of training instances to the size of the learned region.The contribution of this paper is two-fold. We introduce a novel local algorithm that returns all variables with direct edges to and from a target variable T as well as a local algorithm that returns the Markov Blanket of T . Both algorithms (i) are sound, (ii) can be run efficiently in datasets with thousands of variables, and (iii) significantly outperform in terms of approximating the true neighborhood previous state-of-the-art algorithms using only a fraction of the training size required by the existing methods. A fundamental difference between our approach and existing ones is that the required sample depends on the generating graph connectivity and not the size of the local region; this yields up to exponential savings in sample relative to previously known algorithms. The results presented here are promising not only for discovery of local causal structure, and variable selection for classification, but also for the induction of complete BNs.

show abstract

A comprehensive evaluation of multicategory classification methods for microbiomic data

et al. 2013

View full text Add to dashboard Cite

BackgroundRecent advances in next-generation DNA sequencing enable rapid high-throughput quantitation of microbial community composition in human samples, opening up a new field of microbiomics. One of the promises of this field is linking abundances of microbial taxa to phenotypic and physiological states, which can inform development of new diagnostic, personalized medicine, and forensic modalities. Prior research has demonstrated the feasibility of applying machine learning methods to perform body site and subject classification with microbiomic data. However, it is currently unknown which classifiers perform best among the many available alternatives for classification with microbiomic data.ResultsIn this work, we performed a systematic comparison of 18 major classification methods, 5 feature selection methods, and 2 accuracy metrics using 8 datasets spanning 1,802 human samples and various classification tasks: body site and subject classification and diagnosis.ConclusionsWe found that random forests, support vector machines, kernel ridge regression, and Bayesian logistic regression with Laplace priors are the most effective machine learning techniques for performing accurate classification from these microbiomic data.

show abstract

Quantitative forecasting of PTSD from early trauma responses: A Machine Learning application

Galatzer‐Levy

Karstoft

Statnikov

et al. 2014

Journal of Psychiatric Research

173

128

View full text Add to dashboard Cite

There is broad interest in predicting the clinical course of mental disorders from early, multimodal clinical and biological information. Current computational models, however, constitute a significant barrier to realizing this goal. The early identification of trauma survivors at risk of post-traumatic stress disorder (PTSD) is plausible given the disorder’s salient onset and the abundance of putative biological and clinical risk indicators. This work evaluates the ability of Machine Learning (ML) forecasting approaches to identify and integrate a panel of unique predictive characteristics and determine their accuracy in forecasting non-remitting PTSD from information collected within 10 days of a traumatic event. Data on event characteristics, emergency department observations, and early symptoms were collected in 957 trauma survivors, followed for fifteen months. An ML feature selection algorithm identified a set of predictors that rendered all others redundant. Support Vector Machines (SVMs) as well as other ML classification algorithms were used to evaluate the forecasting accuracy of i) ML selected features, ii) all available features without selection, and iii) Acute Stress Disorder (ASD) symptoms alone. SVM also compared the prediction of a) PTSD diagnostic status at 15 months to b) posterior probability of membership in an empirically derived non-remitting PTSD symptom trajectory. Results are expressed as mean Area Under Receiver Operating Characteristics Curve (AUC). The feature selection algorithm identified 16 predictors, present in ≥95% cross-validation trials. The accuracy of predicting non-remitting PTSD from that set (AUC=.77) did not differ from predicting from all available information (AUC=.78). Predicting from ASD symptoms was not better then chance (AUC =.60). The prediction of PTSD status was less accurate than that of membership in a non-remitting trajectory (AUC=.71). ML methods may fill a critical gap in forecasting PTSD. The ability to identify and integrate unique risk indicators makes this a promising approach for developing algorithms that infer probabilistic risk of chronic posttraumatic stress psychopathology based on complex sources of biological, psychological, and social information.

show abstract

Text Categorization Models for High-Quality Article Retrieval in Internal Medicine

Aphinyanaphongs

Tsamardinos

Statnikov

et al. 2004

Journal of the American Medical Informatics Association

133

105

View full text Add to dashboard Cite

OBJECTIVE Finding the best scientific evidence that applies to a patient problem is becoming exceedingly difficult due to the exponential growth of medical publications. The objective of this study was to apply machine learning techniques to automatically identify high-quality, content-specific articles for one time period in internal medicine and compare their performance with previous Boolean-based PubMed clinical query filters of Haynes et al. DESIGN The selection criteria of the ACP Journal Club for articles in internal medicine were the basis for identifying high-quality articles in the areas of etiology, prognosis, diagnosis, and treatment. Naive Bayes, a specialized AdaBoost algorithm, and linear and polynomial support vector machines were applied to identify these articles. MEASUREMENTS The machine learning models were compared in each category with each other and with the clinical query filters using area under the receiver operating characteristic curves, 11-point average recall precision, and a sensitivity/specificity match method. RESULTS In most categories, the data-induced models have better or comparable sensitivity, specificity, and precision than the clinical query filters. The polynomial support vector machine models perform the best among all learning methods in ranking the articles as evaluated by area under the receiver operating curve and 11-point average recall precision. CONCLUSION This research shows that, using machine learning methods, it is possible to automatically build models for retrieving high-quality, content-specific articles using inclusion or citation by the ACP Journal Club as a gold standard in a given time period in internal medicine that perform better than the 1994 PubMed clinical query filters.

show abstract

Regression of Atherosclerosis Is Characterized by Broad Changes in the Plaque Macrophage Transcriptome

et al. 2012

View full text Add to dashboard Cite

We have developed a mouse model of atherosclerotic plaque regression in which an atherosclerotic aortic arch from a hyperlipidemic donor is transplanted into a normolipidemic recipient, resulting in rapid elimination of cholesterol and monocyte-derived macrophage cells (CD68+) from transplanted vessel walls. To gain a comprehensive view of the differences in gene expression patterns in macrophages associated with regressing compared with progressing atherosclerotic plaque, we compared mRNA expression patterns in CD68+ macrophages extracted from plaque in aortic aches transplanted into normolipidemic or into hyperlipidemic recipients. In CD68+ cells from regressing plaque we observed that genes associated with the contractile apparatus responsible for cellular movement (e.g. actin and myosin) were up-regulated whereas genes related to cell adhesion (e.g. cadherins, vinculin) were down-regulated. In addition, CD68+ cells from regressing plaque were characterized by enhanced expression of genes associated with an anti-inflammatory M2 macrophage phenotype, including arginase I, CD163 and the C-lectin receptor. Our analysis suggests that in regressing plaque CD68+ cells preferentially express genes that reduce cellular adhesion, enhance cellular motility, and overall act to suppress inflammation.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.