Driver Missense Mutation Identification Using Feature Selection and Model Fusion

Soliman, Ahmed; Meng, Tao; Chen, Shu‐Ching; Iyengar, S. S.; Iyengar, Puneeth; Yordy, John S.; Shyu, Mei Ling

doi:10.1089/cmb.2015.0110

Cited by 4 publications

(10 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Medical big and high-dimensional data may cause inefficiency and low accuracy. To overcome this issue, many researchers utilize feature extraction algorithms in healthcare informatics [Soliman et al 2015].…”

Section: Analyzingmentioning

confidence: 99%

Computational Health Informatics in the Big Data Age

et al. 2016

Self Cite

View full text Add to dashboard Cite

The explosive growth and widespread accessibility of digital health data have led to a surge of research activity in the healthcare and data sciences fields. The conventional approaches for health data management have achieved limited success as they are incapable of handling the huge amount of complex data with high volume, high velocity, and high variety. This article presents a comprehensive overview of the existing challenges, techniques, and future directions for computational health informatics in the big data age, with a structured analysis of the historical and state-of-the-art methods. We have summarized the challenges into four Vs (i.e., volume, velocity, variety, and veracity) and proposed a systematic data-processing pipeline for generic big data in health informatics, covering data capturing, storing, sharing, analyzing, searching, and decision support. Specifically, numerous techniques and algorithms in machine learning are categorized and compared. On the basis of this material, we identify and discuss the essential prospects lying ahead for computational health informatics in this big data age.

show abstract

Section: Analyzingmentioning

confidence: 99%

Computational Health Informatics in the Big Data Age

et al. 2016

Self Cite

View full text Add to dashboard Cite

show abstract

“…Regression-based methods appeared in 11 selected papers, most of which adopted logistic regression [26,29,31,32,37,56,57]. We also found papers using regularized regressions, including Ridge [49] and Lasso regression [23].…”

Section: Methods Based On Supervised Learningmentioning

confidence: 95%

“…The first proposals by Carter et al [19] and Capriotti et al [20] were based on these algorithms. Among the SVM-based approaches, whereas most papers adopted the traditional SVM algorithm [20,22,24,27,31,32,39,55,56,57,58], we observed three papers using OneClass SVM [45,49,59] and one paper using Sequential Minimal Optimization (SMO) [28]. SVM is a popular and consolidated technique in the field, as it continues to be largely applied throughout the years since 2011.…”

Section: Methods Based On Supervised Learningmentioning

confidence: 99%

“…Among these, six papers [23,29,36,53,54,57] aimed to distinguish oncogene and tumor suppressor gene (TSG) (i.e., the two subclasses of CDGs), whereas the others focused on classifying a given gene as CDG or not. Seven papers targeted predictions on mutation level [24,27,31,37,42,43,58], most of which restricted the analysis for missense mutations. We also found one paper aiming at identifying cancer modules to discover cancer driver genes [25] and other focusing on the prediction of false positive CDGs [55].…”

Section: Overview Of Selected Papersmentioning

confidence: 99%

“…Finally, amino acids substitution scores were employed by several studies [19,22,24,27,28], most of which integrated distinct substitution scoring matrices. Tan et al [22], for instance, defined 51 features by integrating dozens of substitution scoring matrices from the AAIndex database, which was explored in other studies [28,31]. The evolution-based subcategory was employed in 14 studies, most of which computed evolutionary conservation scores using distinct strategies or tools [19,20,24,27,28,34,37,42,43,54,57].…”

Section: Functional Impactmentioning

confidence: 99%

See 2 more Smart Citations

Machine learning methods for prediction of cancer driver genes: a survey paper

Andrades,

Recamonde-Mendoza

2021

Preprint

View full text Add to dashboard Cite

Identifying the genes and mutations that drive the emergence of tumors is a major step to improve understanding of cancer and identify new directions for disease diagnosis and treatment. Despite the large volume of genomics data, the precise detection of driver mutations and their carrying genes, known as cancer driver genes, from the millions of possible somatic mutations remains a challenge. Computational methods play an increasingly important role in identifying genomic patterns associated with cancer drivers and developing models to predict driver events. Machine learning (ML) has been the engine behind many of these efforts and provides excellent opportunities for tackling remaining gaps in the field. Thus, this survey aims to perform a comprehensive analysis of ML-based computational approaches to identify cancer driver mutations and genes, providing an integrated, panoramic view of the broad data and algorithmic landscape within this scientific problem. We discuss how the interactions among data types and ML algorithms have been explored in previous solutions and outline current analytical limitations that deserve further attention from the scientific community. We hope that by helping readers become more familiar with significant developments in the field brought by ML, we may inspire new researchers to address open problems and advance our knowledge towards cancer driver discovery.

show abstract

Machine learning methods for prediction of cancer driver genes: a survey paper

Andrades

Recamonde‐Mendoza

2022

Briefings in Bioinformatics

View full text Add to dashboard Cite

Identifying the genes and mutations that drive the emergence of tumors is a critical step to improving our understanding of cancer and identifying new directions for disease diagnosis and treatment. Despite the large volume of genomics data, the precise detection of driver mutations and their carrying genes, known as cancer driver genes, from the millions of possible somatic mutations remains a challenge. Computational methods play an increasingly important role in discovering genomic patterns associated with cancer drivers and developing predictive models to identify these elements. Machine learning (ML), including deep learning, has been the engine behind many of these efforts and provides excellent opportunities for tackling remaining gaps in the field. Thus, this survey aims to perform a comprehensive analysis of ML-based computational approaches to identify cancer driver mutations and genes, providing an integrated, panoramic view of the broad data and algorithmic landscape within this scientific problem. We discuss how the interactions among data types and ML algorithms have been explored in previous solutions and outline current analytical limitations that deserve further attention from the scientific community. We hope that by helping readers become more familiar with significant developments in the field brought by ML, we may inspire new researchers to address open problems and advance our knowledge towards cancer driver discovery.

show abstract

Driver Missense Mutation Identification Using Feature Selection and Model Fusion

Cited by 4 publications

References 21 publications

Computational Health Informatics in the Big Data Age

Computational Health Informatics in the Big Data Age

Machine learning methods for prediction of cancer driver genes: a survey paper

Machine learning methods for prediction of cancer driver genes: a survey paper

Contact Info

Product

Resources

About