Wael Awada scite author profile

Univariate feature rankers have been frequently used to order genes (features) in terms of their importance to a given bioinformatics challenge. Unfortunately, the resulting feature subsets tend to differ when applied to related (but distinct) datasets, or when applied to datasets which have been varied or corrupted in some fashion. As a result, a research focus has recently been on methods to measure or improve the stability of these feature subsets. One such method is called rank aggregation. Rank aggregation is the process of combining the information from several ranked lists (or in this case ordered gene lists) into a single more stable list. While there has been work on the creation of these methods, very little work has gone into comparing the lists generated by these techniques. Such a comparison allows for grouping the techniques into families, both for understanding how the families affect rank aggregation and for using less-computationally-expensive members of a given family. This paper is an extensive study on nine rank aggregation techniques across twenty-six bioinformatics datasets. Our results show that certain aggregation techniques are very similar to each other, while others are quite unique in that they are not similar to the other techniques. Additionally, it was found that as the size of the feature subset increases, the similarity between the techniques increases. To our knowledge this is the first study which examines this many rank aggregation techniques within the domain of bioinformatics.

show abstract

Energy-Efficient Data Gathering in Heterogeneous Wireless Sensor Networks

Awada

Cardei

View full text Add to dashboard Cite

A Review of Ensemble Classification for DNA Microarrays Data

Khoshgoftaar

Dittman

Wald

et al. 2013

View full text Add to dashboard Cite

Remote labs environments (RLE): a constructivist online experimentation in science, engineering, and information technology

Humos

Alhalabi

Hamzal

et al. 2005

View full text Add to dashboard Cite

The Effect of Number of Iterations on Ensemble Gene Selection

Awada

Khoshgoftaar

Dittman

et al. 2012

View full text Add to dashboard Cite

Dimensionality-reducing techniques such as gene selection have become commonplace in order to reduce the high dimensionality found within bioinformatics datasets such as DNA microarray datasets. The degree of dimensionality is reduced by identifying and removing redundant and irrelevant features or genes and leaving only an optimum subset of features for subsequent analysis. However, a number of feature selection techniques show poor stability (resistance to change in the underlying data). One approach for increasing the stability of feature subsets is ensemble feature selection. This is performed first by generating multiple ranked gene lists and then aggregating the results using an aggregation function. While research has been performed on ensemble feature selection and its effect on gene list stability, there has been little research on an important choice made in the process of ensemble feature selection: the number of iterations (or repetitions) of feature selection. The computation time of ensemble feature selection is greatly affected by the number of ranked lists generated: the higher the number of iterations, the more computation time is required. To study this, we evaluate the similarity among feature subsets generated from two different approaches to ensemble feature selection (data diversity and hybrid approach). We calculate the similarity between the final ranked lists generated using 10, 20 and 50 iterations, using the mean aggregation function. Our results show that the similarity between 20 and 50 iterations is high enough for us to recommend using 20 iterations instead of 50 and thus saving the large amount of computation time required for 50 iterations.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wael Awada

A review of the stability of feature selection techniques for bioinformatics data

An extensive comparison of feature ranking aggregation techniques in bioinformatics

Energy-Efficient Data Gathering in Heterogeneous Wireless Sensor Networks

A Review of Ensemble Classification for DNA Microarrays Data

Remote labs environments (RLE): a constructivist online experimentation in science, engineering, and information technology

The Effect of Number of Iterations on Ensemble Gene Selection

Contact Info

Product

Resources

About