A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis

Liang, Shuhua; Ma, Anjun; Yang, Sen; Wang, Yan; Ma, Qin

doi:10.1016/j.csbj.2018.02.005

Cited by 56 publications

(35 citation statements)

References 88 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Data normalization is a method to standardize the range of features without reducing the dimension of the data [19][20][21][22]41]. Data normalization process is important since it is important to select the best features without eliminating useful information from the preprocessed data [19][20][21][22]. Conventional single stage feature selection having the drawback of possibly selecting data after eliminating useful data during feature extraction stage.…”

Section: Stage 1: Data Normalization Methods and Data Dimension Reductionmentioning

confidence: 99%

“…Thus, for this work, raw data samples are normalized using ten different data normalization methods. Based on the comprehensive review done on the previous researches, five data normalization methods are chosen from the commonly used methods, namely, Decimal Scaling (DS), Z-score (ZS), Linear Scaling (LS), Min-Max (MM) and Mean & Standard Deviation (MSD) methods [19][20][21][22]. The other five data normalization methods are newly introduced in early breast cancer detection application, namely, Relative Logarithmic Sum Squared Voltage (RLSSV), Relative Logarithmic Voltage (RLV), Relative Voltage (RV), Fractional Voltage Change (FVC) and Relative Sum Squared Voltage (RSSV) [8][9].…”

Section: Stage 1: Data Normalization Methods and Data Dimension Reductionmentioning

confidence: 99%

“…Once the data is normalized, the normalized data is dimensionally reduced to remove redundant and statistically insignificant data [21]. The dimension of data is reduced as follows:…”

Section: Stage 1: Data Normalization Methods and Data Dimension Reductionmentioning

confidence: 99%

“…Each fold will take turn to be the testing fold, until the training-testing process completed. Confusion matrices are generated for each iteration, and the accuracy, sensitivity and specificity are calculated for each iteration using equations (21) to (23). The average classification accuracy, sensitivity and specificity of all folds are considered as the performance of the classifier [8].…”

Section: Classification Of Breast Cancer Sizementioning

confidence: 99%

See 3 more Smart Citations

Multi-Stage Feature Selection (MSFS) Algorithm for UWB-Based Early Breast Cancer Size Prediction

Vijayasarveswari

Andrew

Jusoh

et al. 2020

Preprint

View full text Add to dashboard Cite

Breast cancer is the most common cancer among women and it is one of the main causes of death for women worldwide. To attain an optimum medical treatment for breast cancer, an early breast cancer detection is crucial. This paper proposes a multistage feature selection method that extracts statistically significant features for breast cancer size detection using proposed data normalization techniques. Ultrawideband (UWB) signals, controlled using microcontroller are transmitted via an antenna from one end of the breast phantom and are received on the other end. These ultra-wideband analogue signals are represented in both time and frequency domain. The preprocessed digital data is passed to the proposed multi-stage feature selection algorithm. This algorithm has four selection stages. It comprises of data normalization methods, feature extraction, data dimensional reduction and feature fusion. The output data is fused together to form the proposed datasets, namely, 8-HybridFeature, 9-HybridFeature and 10-HybridFeature datasets. The classification performance of these datasets is tested using the Support Vector Machine, Probabilistic Neural Network and Naïve Bayes classifiers for breast cancer size classification. The research findings indicate that the 8-HybridFeature dataset performs better in comparison to the other two datasets. For the 8-HybridFeature dataset, the Naïve Bayes classifier (91.98%) outperformed the Support Vector Machine (90.44%) and Probabilistic Neural Network (80.05%) classifiers in terms of classification accuracy. The finalized method is tested and visualized in the MATLAB based 2D and 3D environment.

show abstract

Section: Stage 1: Data Normalization Methods and Data Dimension Reductionmentioning

confidence: 99%

Section: Stage 1: Data Normalization Methods and Data Dimension Reductionmentioning

confidence: 99%

“…Once the data is normalized, the normalized data is dimensionally reduced to remove redundant and statistically insignificant data [21]. The dimension of data is reduced as follows:…”

Section: Stage 1: Data Normalization Methods and Data Dimension Reductionmentioning

confidence: 99%

Section: Classification Of Breast Cancer Sizementioning

confidence: 99%

See 2 more Smart Citations

Multi-Stage Feature Selection (MSFS) Algorithm for UWB-Based Early Breast Cancer Size Prediction

Vijayasarveswari

Andrew

Jusoh

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…There are currently four methods available in literature to identify 4mC sites, including iDNA4mC (Chen et al, 2017), 4mCPred (Su et al, 2018), 4mcPred-SVM (Wei et al, 2018a), and 4mcPred-IFL (Wei et al, 2019a). iDNA4mC, as the first machine learning predictor, encodes sequences by nucleotide chemical properties and nucleotide frequency to features and trains support vector machine (SVM) models for prediction (Liang et al, 2018). Although this method has the ability to distinguish between 4mC and non-4mC sites, the prediction accuracy is relatively low overall.…”

Section: Introductionmentioning

confidence: 99%

Developing a Multi-Layer Deep Learning Based Predictive Model to Identify DNA N4-Methylcytosine Modifications

Zeng

Liao

2020

Front. Bioeng. Biotechnol.

View full text Add to dashboard Cite

DNA N4-methylcytosine modification (4mC) plays an essential role in a variety of biological processes. Therefore, accurate identification the 4mC distribution in genome-scale is important for systematically understanding its biological functions. In this study, we present Deep4mcPred, a multi-layer deep learning based predictive model to identify DNA N4-methylcytosine modifications. In this predictor, we for the first time integrate residual network and recurrent neural network to build a multi-layer deep learning predictive system. As compared to existing predictors using traditional machine learning, our proposed method has two advantages. First, our deep learning framework does not need to specify the features when training the predictive model. It can automatically learn the high-level features and capture the characteristic specificity of 4mC sites, benefiting to distinguish true 4mC sites from non-4mC sites. On the other hand, our deep learning method outperforms the traditional machine learning predictors in performance by benchmarking comparison, demonstrating that the proposed Deep4mcPred is more effective in the DNA 4mC site prediction. Moreover, via experimental comparison, we found that attention mechanism introduced into the deep learning framework is useful to capture the critical features. Additionally, we develop a webserver implementing the proposed method for the academic use of research community, which is now available at http://server.malab.cn/Deep4mcPred.

show abstract

Building and Interpreting Artificial Neural Network Models for Biological Systems

Nair

2020

Methods in Molecular Biology

View full text Add to dashboard Cite

A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis

Cited by 56 publications

References 88 publications

Multi-Stage Feature Selection (MSFS) Algorithm for UWB-Based Early Breast Cancer Size Prediction

Multi-Stage Feature Selection (MSFS) Algorithm for UWB-Based Early Breast Cancer Size Prediction

Developing a Multi-Layer Deep Learning Based Predictive Model to Identify DNA N4-Methylcytosine Modifications

Building and Interpreting Artificial Neural Network Models for Biological Systems

Contact Info

Product

Resources

About