Application of high-dimensional feature selection: evaluation for genomic prediction in man

Bermingham, Mairead Lesley; Pong‐Wong, Ricardo; Spiliopoulou, Athina; Hayward, Caroline; Rudan, Igor; Campbell, Harry; Wright, Alan F.; Wilson, Jim; Agakov, Felix; Navarro, Pau; Haley, Chris

doi:10.1038/srep10312

Cited by 269 publications

(144 citation statements)

References 56 publications

Supporting

Mentioning

139

Contrasting

Unclassified

Order By: Relevance

“…Feature selection, including the removal of noisy features and elimination of ineffective vocabulary, makes training and applying a classier more effective [67]. The existing approaches to finding an adequate subset of features fall into two groups: feature filters and feature wrappers [68].…”

Section: Increase Accuracy By Using Feature Selectionmentioning

confidence: 99%

Big Data: Deep Learning for financial sentiment analysis

et al. 2018

View full text Add to dashboard Cite

Section: Increase Accuracy By Using Feature Selectionmentioning

confidence: 99%

Big Data: Deep Learning for financial sentiment analysis

et al. 2018

View full text Add to dashboard Cite

“…Therefore, dimensionality reduction methods have been extensively studied in the literature to reduce the number of dimensions. The known benefits include (a) to simplify the outputs models for easier interpretation by users [10], (b) to save computational resources and reduce time, and (c) to reduce over-fitting [11].…”

Section: Related Workmentioning

confidence: 99%

Feature Construction and Calibration for Clustering Daily Load Curves from Smart-Meter Data

Alotaibi

Jin

Wilcox

et al. 2016

IEEE Trans. Ind. Inf.

114

View full text Add to dashboard Cite

General rightsThis document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Abstract-This paper proposes and compares feature construction and calibration methods for clustering daily electricity load curves. Such load curves describe electricity demand over a period of time. A rich body of the literature has studied clustering of load curves, usually using temporal features. This limits the potential to discover new knowledge which may not be best represented as models consisting of all time points on load curves. This paper presents three new methods to construct features: conditional filters on time-resolution based features, calibration and normalization, and using profile errors. These new features extend the potential of clustering load curves. Moreover, smart metering is now generating high-resolution time series, and so the dimensionality reduction offered by these features is welcome.The clustering results using the proposed new features are compared with clusterings obtained from temporal features as well as clusterings with Fourier features, using household electricity consumption time series as test data. The experimental results suggest that the proposed feature construction methods offer new means for gaining insight in energy consumption patterns.

show abstract

“…First, as the dimension of the data increases, the number of observations needed for model training and consequently the study costs increase too. Second, if we even ignored the need for more training observations, we would also encounter other probable problems, such as the curse of dimensionality [31], or complicated models in need of longer training time [32]. Feature reduction is one solution for dealing with this problem in a way that tries to exclude redundant or uninformative features [33].…”

Section: Methodsmentioning

confidence: 99%

Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations

Torbati¹,

Mitreva²,

Gopalakrishnan³

2016

Preprint

View full text Add to dashboard Cite

Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained Author Contributions: M.E.T. contributed to the design and execution of the experiments in this paper, wrote the scripts, prepared the supplementary information and produced the first draft of the paper. V.G. and M.M. conceived the idea and guided the study. M.M. helped in acquisition of the microbiota data along with collaborators mentioned in the acknowledgments. V.G. contributed to computational experimental design. All authors read, revised and approved the paper for submission. Conflicts of Interest:The authors declare no conflict of interest. HHS Public AccessAuthor manuscript Data (Basel). Author manuscript; available in PMC 2017 February 24. Author Manuscript Author ManuscriptAuthor ManuscriptAuthor Manuscript from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the achieved groupings. We observed a 6% to 23% and 7% to 26% performance improvement based on the Area Under the receiver operating characteristic (ROC) Curve (AUC) and Balanced Accuracy (Bacc) measures, respectively, over 10 runs of 10-fold cross-validation. These results show that using phylogenetic dependency for grouping our microbio...

show abstract

Application of high-dimensional feature selection: evaluation for genomic prediction in man

Cited by 269 publications

References 56 publications

Big Data: Deep Learning for financial sentiment analysis

Big Data: Deep Learning for financial sentiment analysis

Feature Construction and Calibration for Clustering Daily Load Curves from Smart-Meter Data

Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations

Contact Info

Product

Resources

About