Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

Peng, Hanchuan; Long, Fuhui; Ding, Chris

doi:10.1109/tpami.2005.159

Cited by 8,151 publications

(1,892 citation statements)

References 19 publications

Supporting

Mentioning

1,883

Contrasting

Unclassified

Order By: Relevance

“…It can be used for analyzing behavioral data, where many of the properties we have highlighted could be useful (e.g., CMI, interactions). It has been used for feature selection in general classification problems [Lefakis and Fleuret, 2014; Peng et al, 2005; Torkkola, 2003] and we hope GCMI would provide practical advantages in many such applications. We further suggest that the copula normalization could be used as a general preprocessing step that would convert any covariance‐based statistic or algorithm into a robust rank‐based version (e.g., common spatial patterns, canonical correlation analysis, linear/quadratic discriminant analysis).…”

Section: Discussionmentioning

confidence: 99%

A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula

Ince

Giordano

Kayser

et al. 2016

Human Brain Mapping

265

298

View full text Add to dashboard Cite

We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open‐source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541–1573, 2017. © 2016 Wiley Periodicals, Inc.

show abstract

Section: Discussionmentioning

confidence: 99%

A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula

Ince

Giordano

Kayser

et al. 2016

Human Brain Mapping

265

298

View full text Add to dashboard Cite

show abstract

“…The proposed ensemble of TQ 2 I summary statistics, specifically, CVPAI2(R: A ⇒ B ⊆ A × B), OA(OAMTRX = FrequencyCount(A × B)) and class-conditional probabilities(OAMTRX), is an original minimally dependent and maximally informative (mDMI) set (Si Liu, Hairong Liu, Latecki, Xu, & Lu, 2011; Peng, Long, & Ding, 2005) of outcome Q 2 Is (O-Q 2 Is), to be jointly maximized according to the Pareto formal analysis of multi-objective optimization problems (Boschetti, Flasse, & Brivio, 2004); refer to the Part 1, Chapter 1.…”

Section: Methodsmentioning

confidence: 99%

“…According to the GEO-CEOS Val guidelines (GEO-CEOS, 2010; GEO-CEOS WGCV, 2015), Val is the process of assessing, by independent means, the quality of an information processing system by means of an mDMI set (Si Liu et al, 2011; Peng et al, 2005) of community-agreed outcome and process (OP) Q 2 Is (OP- Q 2 Is), each one provided with a degree of uncertainty in measurement, ± δ, with δ ≥ 0%.…”

Section: Validation Sessionmentioning

confidence: 99%

“…An mDMI set of O-Q 2 Is (Si Liu et al, 2011; Peng et al, 2005), comprising OA(OAMTRX = FrequencyCount(A × B)), CVPAI2(R: A ⇒ B ⊆ A × B), class-conditional probabilities p ( r | t ) of reference class r = 1 , …, RC = |B|, given test class t = 1, …, TC = |A|, and class-conditional probabilities p ( t | r ), with r = 1 , …, RC = |B|, t = 1, …, TC = |A|, was estimated in the four test cases described in previous Chapter 4.3 to Chapter 4.5.…”

Section: Validation Sessionmentioning

confidence: 99%

See 1 more Smart Citation

GEO-CEOS stage 4 validation of the Satellite Image Automatic Mapper lightweight computer program for ESA Earth observation level 2 product generation – Part 2: Validation

et al. 2018

View full text Add to dashboard Cite

ESA defines as Earth Observation (EO) Level 2 information product a multi-spectral (MS) image corrected for atmospheric, adjacency, and topographic effects, stacked with its data-derived scene classification map (SCM), whose legend includes quality layers cloud and cloud-shadow. No ESA EO Level 2 product has ever been systematically generated at the ground segment. To fill the information gap from EO big data to ESA EO Level 2 product in compliance with the GEO-CEOS stage 4 validation (Val) guidelines, an off-the-shelf Satellite Image Automatic Mapper (SIAM) lightweight computer program was selected to be validated by independent means on an annual 30 m resolution Web-Enabled Landsat Data (WELD) image composite time-series of the conterminous U.S. (CONUS) for the years 2006 to 2009. The SIAM core is a prior knowledge-based decision tree for MS reflectance space hyperpolyhedralization into static (non-adaptive to data) color names. For the sake of readability, this paper was split into two. The present Part 2—Validation—accomplishes a GEO-CEOS stage 4 Val of the test SIAM-WELD annual map time-series in comparison with a reference 30 m resolution 16-class USGS National Land Cover Data (NLCD) 2006 map. These test and reference map pairs feature the same spatial resolution and spatial extent, but their legends differ and must be harmonized, in agreement with the previous Part 1 - Theory. Conclusions are that SIAM systematically delivers an ESA EO Level 2 SCM product instantiation whose legend complies with the standard 2-level 4-class FAO Land Cover Classification System (LCCS) Dichotomous Phase (DP) taxonomy.

show abstract

“…Gene selection mainly has two merits (Peng et al, 2005;Saeys et al, 2007). First, it can reduce dramatically the number of genes used in classifying the disease and overcome the problem of the "curse of dimensionality".…”

Section: Introductionmentioning

confidence: 99%

Applying the Fisher score to identify Alzheimer's disease-related genes

Yang¹,

Yl²,

Cs³

et al. 2016

Genet. Mol. Res.

View full text Add to dashboard Cite

ABSTRACT. Biologists and scientists can use the data from Alzheimer's disease (AD) gene expression microarrays to mine AD disease-related genes. Because of disadvantages such as small sample sizes, high dimensionality, and a high level of noise, it is difficult to obtain accurate and meaningful biological information from gene expression profiles. In this paper, we present a novel approach for utilizing AD microarray data to identify the morbigenous genes. The Fisher score, a classical feature selection method, is utilized to evaluate the importance of each gene. Genes with a large between-classes variance and small withinclass variance are selected as candidate morbigenous genes. The results using an AD dataset show that the proposed approach is effective for gene selection. Satisfactory accuracy can be achieved by using only a small number of selected genes.

show abstract

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

Cited by 8,151 publications

References 19 publications

A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula

A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula

GEO-CEOS stage 4 validation of the Satellite Image Automatic Mapper lightweight computer program for ESA Earth observation level 2 product generation – Part 2: Validation

Applying the Fisher score to identify Alzheimer's disease-related genes

Contact Info

Product

Resources

About