David J. Dittman scite author profile

David J. Dittman

Sign up to set email alerts

|

41Publications

293Citation Statements Received

372Citation Statements Given

How they've been cited

How they cite others

Affiliations

Northwestern University, Florida Atlantic University

Publications

Order By: Most citations

A review of the stability of feature selection techniques for bioinformatics data

¹

,

²

,

³

et al. 2012

View full text Add to dashboard Cite

Using Random Undersampling to Alleviate Class Imbalance on Tweet Sentiment Data

¹

,

²

,

³

et al. 2015

View full text Add to dashboard Cite

The Effect of Data Sampling When Using Random Forest on Imbalanced Bioinformatics Data

¹

,

²

,

³

2015

View full text Add to dashboard Cite

An extensive comparison of feature ranking aggregation techniques in bioinformatics

¹

,

²

,

³

et al. 2012

View full text Add to dashboard Cite

Univariate feature rankers have been frequently used to order genes (features) in terms of their importance to a given bioinformatics challenge. Unfortunately, the resulting feature subsets tend to differ when applied to related (but distinct) datasets, or when applied to datasets which have been varied or corrupted in some fashion. As a result, a research focus has recently been on methods to measure or improve the stability of these feature subsets. One such method is called rank aggregation. Rank aggregation is the process of combining the information from several ranked lists (or in this case ordered gene lists) into a single more stable list. While there has been work on the creation of these methods, very little work has gone into comparing the lists generated by these techniques. Such a comparison allows for grouping the techniques into families, both for understanding how the families affect rank aggregation and for using less-computationally-expensive members of a given family. This paper is an extensive study on nine rank aggregation techniques across twenty-six bioinformatics datasets. Our results show that certain aggregation techniques are very similar to each other, while others are quite unique in that they are not similar to the other techniques. Additionally, it was found that as the size of the feature subset increases, the similarity between the techniques increases. To our knowledge this is the first study which examines this many rank aggregation techniques within the domain of bioinformatics.

Random forest: A reliable tool for patient response prediction

¹

,

²

,

³

et al. 2011

View full text Add to dashboard Cite

First Order Statistics Based Feature Selection: A Diverse and Powerful Family of Feature Seleciton Techniques

¹

,

²

,

³

et al. 2012

View full text Add to dashboard Cite

Dimensionality reduction techniques have become a required step when working with bioinformatics datasets. Techniques such as feature selection have been known to not only improve computation time, but to improve the results of experiments by removing the redundant and irrelevant features or genes from consideration in subsequent analysis. Univariate feature selection techniques in particular are well suited for the large levels of high dimensionality that are inherent in bioinformatics datasets (for example: DNA microarray datasets) due to their intuitive output (a ranked lists of features or genes) and their relatively small computational time compared to other techniques. This paper presents seven univariate feature selection techniques and collects them into a single family entitled First Order Statistics (FOS) based feature selection. These seven all share the trait of using first order statistical measures such as mean and standard deviation, although this is the first work to relate them to one another and consider their performance compared with one another. In order to examine the properties of these seven techniques we performed a series of similarity and classification experiments on eleven DNA microarray datasets. Our results show that in general, each feature selection technique will create diverse feature subsets when compared to the other members of the family. However when we look at classification we find that, with one exception, the techniques will produce good classification results and that the techniques will have similar performances to each other. Our recommendation, is to use the rankers Signal-to-Noise and SAM for the best classification results and to avoid Fold Change Ratio as it is consistently the worst performer of the seven rankers.

Mean Aggregation versus Robust Rank Aggregation for Ensemble Gene Selection

¹

,

²

,

³

2012

View full text Add to dashboard Cite

Stability Analysis of Feature Ranking Techniques on Biological Datasets

¹

,

²

,

³

et al. 2011

View full text Add to dashboard Cite

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Copyright © 2024 scite LLC. All rights reserved.

Made with 💙 for researchers

Part of the Research Solutions Family.