Zhanhuai Li scite author profile

BackgroundThe classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data.ResultsIn this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification.ConclusionsThe multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.

show abstract

Identification of cancer subtypes by integrating multiple types of transcriptomics data with deep learning in breast cancer

Guo

Shang

2019

Neurocomputing

View full text Add to dashboard Cite

Aspect-level sentiment analysis based on gradual machine learning

Wang

Chen

Shen

et al. 2021

Knowledge-Based Systems

View full text Add to dashboard Cite

Constructing domain-dependent sentiment dictionary for sentiment analysis

Ahmed

Chen

2020

Neural Comput & Applic

View full text Add to dashboard Cite

Parallelizing maximal clique and k-plex enumeration over graph data

Wang

Chen

Hou

et al. 2017

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework

Chen

Fan

et al. 2018

View full text Add to dashboard Cite

Even though many machine algorithms have been proposed for entity resolution, it remains very challenging to find a solution with quality guarantees. In this paper, we propose a novel HUman and Machine cOoperation (HUMO) framework for entity resolution (ER), which divides an ER workload between the machine and the human. HUMO enables a mechanism for quality control that can flexibly enforce both precision and recall levels. We introduce the optimization problem of HUMO, minimizing human cost given a quality requirement, and then present three optimization approaches: a conservative baseline one purely based on the monotonicity assumption of precision, a more aggressive one based on sampling and a hybrid one that can take advantage of the strengths of both previous approaches. Finally, we demonstrate by extensive experiments on real and synthetic datasets that HUMO can achieve high-quality results with reasonable return on investment (ROI) in terms of human cost, and it performs considerably better than the state-of-the-art alternatives in quality control.

show abstract

Study on Cloud Storage System Based on Distributed Storage Systems

Qian

Zhang

2010

View full text Add to dashboard Cite

A Human-and-Machine Cooperative Framework for Entity Resolution with Quality Guarantees

Chen

Chen²,

Li³

2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhanhuai Li

BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data

Identification of cancer subtypes by integrating multiple types of transcriptomics data with deep learning in breast cancer

Aspect-level sentiment analysis based on gradual machine learning

Constructing domain-dependent sentiment dictionary for sentiment analysis

Parallelizing maximal clique and k-plex enumeration over graph data

Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework

Study on Cloud Storage System Based on Distributed Storage Systems

A Human-and-Machine Cooperative Framework for Entity Resolution with Quality Guarantees

Contact Info

Product

Resources

About