Hongrong Cheng scite author profile

Hongrong Cheng

Sign up to set email alerts

|

25Publications

115Citation Statements Received

157Citation Statements Given

How they've been cited

How they cite others

Affiliations

Second Affiliated Hospital of Zhejiang University, University of Electronic Science and Technology of China, Zhejiang University

Publications

Order By: Most citations

Conditional Mutual Information-Based Feature Selection Analyzing for Synergy and Redundancy

Qin²,

et al. 2011

View full text Add to dashboard Cite

Battiti's mutual information feature selector (MIFS) and its variant algorithms are used for many classification applications. Since they ignore feature synergy, MIFS and its variants may cause a big bias when features are combined to cooperate together. Besides, MIFS and its variants estimate feature redundancy regardless of the corresponding classification task. In this paper, we propose an automated greedy feature selection algorithm called conditional mutual information-based feature selection (CMIFS). Based on the link between interaction information and conditional mutual information, CMIFS takes account of both redundancy and synergy interactions of features and identifies discriminative features. In addition, CMIFS combines feature redundancy evaluation with classification tasks. It can decrease the probability of mistaking important features as redundant features in searching process. The experimental results show that CMIFS can achieve higher best-classification-accuracy than MIFS and its variants, with the same or less (nearly 50%) number of features.Keywords: Classification, feature selection, conditional mutual information, redundancy, interaction. Manuscript received Apr. 20, 2010; revised June 13, 2010; accepted June 28, 2010 I. IntroductionFeature selection plays an important role in improving accuracy, efficiency, and scalability of the classification process. Since relevant features are often unknown a priori in the real world, irrelevant and redundant features are introduced to represent the domain. However, more features will significantly slow down the learning process and lead to classification over-fitting. With a limited amount of sample data, irrelevant features may obscure the distributions of a small set of truly relevant features for the task and confuse the learning algorithms. It has been proven in both theoretical and empirical aspects that reducing the number of irrelevant or redundant features drastically increases the learning efficiency of algorithms and yields more general concepts for a better insight into the classification tasks.In supervised classification learning, one is given a training set of labeled instances. An instance is typically described as an assignment of attribute values to a set of features F, and each instance is associated with one of l possible classes in C = {c 1 , …, c l }. The feature selection can be formalized by selecting a minimum subset S from the original feature set F such that P(C|S) is as close as possible to P(C|F), where P(C|S) and P(C|F) are the approximate conditional probability distribution given the training set [1]. The minimum subset S is called an optimal subset. To find the best subset, the order of the search space is O(2 n ), where n is the original number of features [2]. In practice, it is hard to search the feature subspace exhaustively because it is a huge number even for mediumsized n. A lot of problems related to feature selection are shown to be NP-hard [3]. Alternatively, many sequential-search-based approximation scheme...

Demographic Information Prediction: A Portrait of Smartphone Application Users

¹

,

²

,

³

et al. 2018

IEEE Trans. Emerg. Topics Comput.

View full text Add to dashboard Cite

Conditional Mutual Information Based Feature Selection

¹

,

²

,

³

et al. 2008

View full text Add to dashboard Cite

Modeling and analysis of passive worm propagation in the P2P file-sharing network

¹

,

²

,

³

et al. 2015

Simulation Modelling Practice and Theory

View full text Add to dashboard Cite

Efficient Modeling of Spam Images

¹

,

²

,

³

et al. 2010

View full text Add to dashboard Cite

Graph-Based Semi-supervised Feature Selection with Application to Automatic Spam Image Identification

¹

,

²

,

³

et al. 2011

View full text Add to dashboard Cite

A Systematic Analysis of the Role of Unc-5 Netrin Receptor A (UNC5A) in Human Cancers

¹

,

²

,

³

et al. 2022

View full text Add to dashboard Cite

Unc-5 netrin receptor A (UNC5A), a netrin family receptor, plays a key role in neuronal development and subsequent differentiation. Recently, studies have found that UNC5A plays an important role in multiple cancers, such as bladder cancer, non-small cell lung carcinoma, and colon cancer but its pan-cancer function is largely unknown. Herein, the R software and multiple databases or online websites (The Cancer Genome Atlas (TCGA), The Genotype-Tissue Expression (GTEx), The Tumor Immune Estimation Resource (TIMER), The Gene Set Cancer Analysis (GSCA), Gene Expression Profiling Interactive Analysis (GEPIA), and cBioPortal etc.) were utilized to examine the role of UNC5A in pan-cancer. UNC5A was found to be highly expressed across multiple human cancer tissues and cells, was linked to clinical outcomes of patients, and was a potential pan-cancer biomarker. The mutational landscape of UNC5A exhibited that patients with UNC5A mutations had poorer progress free survival (PFS) in head and neck squamous cell carcinoma (HNSC) and prostate adenocarcinoma (PRAD). Furthermore, UNC5A expression was associated with tumor mutation burden (TMB), neoantigen, tumor microenvironment (TME), tumor microsatellite instability (MSI), immunomodulators, immune infiltration, DNA methylation, immune checkpoint (ICP) genes, and drug responses. Our results suggest the potential of UNC5A as a pan-cancer biomarker and an efficient immunotherapy target, which may also guide drug selection for some specific cancer types in clinical practice.

E-CVFDT: An improving CVFDT method for concept drift data stream

¹

,

²

,

³

et al. 2013

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Copyright © 2024 scite LLC. All rights reserved.

Made with 💙 for researchers

Part of the Research Solutions Family.