BackgroundRecently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments.ResultsIn this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results.ConclusionSpearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0778-7) contains supplementary material, which is available to authorized users.
Background Little is known about epigenetic silencing of genes by promoter hypermethylation in renal cell carcinoma (RCC). The aim of this study was to identify prognostic methylation markers in surgically treated clear cell RCC (ccRCC). Methods Methylation patterns were assayed using the Infinium HumanMethylation450 BeadChip array on pairs of ccRCC and normal tissue from 12 patients. Using quantitative PSQ analysis, tumor-specific hypermethylated genes were validated in 25 independent cohorts and their clinical relevance was also verified in 152 independent cohorts. Results Using genome-wide methylation array, Zinc finger protein 278 ( ZNF278 ), Family with sequence similarity 155 member A ( FAM155A ) and Dipeptidyl peptidase 6 ( DPP6 ) were selected for tumor-specific hypermethylated genes in primary ccRCC. The promoter methylation of these genes occurred more frequently in ccRCC than normal kidney in independent validation cohort. The hypermethylation of three genes were associated with advanced tumor stage and high grade tumor in ccRCC. During median follow-up of 39.2 (interquartile range, 15.4–79.1) months, 22 (14.5%) patients experienced distant metastasis. Multivariate analysis identified the methylation status of these three genes, either alone, or in a combined risk score as an independent predictor of distant metastasis. Conclusion The promoter methylation of ZNF278 , FAM155A and DPP6 genes are associated with aggressive tumor phenotype and early development of distant metastasis in patients with surgically treated ccRCC. These potential methylation markers, either alone, or in combination, could provide novel targets for development of individualized therapeutic and prevention regimens.
The incidence of stomach cancer has been found to be gradually decreasing; however, it remains one of the most frequently occurring malignant cancers in Korea. According to statistics of 2017, stomach cancer is the top cancer in men and the fourth most important cancer in women, necessitating methods for its early detection and treatment. Considerable research in the field of bioinformatics has been conducted in cancer studies, and bioinformatics approaches might help develop methods and models for its early prediction. We aimed to develop a classification method based on deep learning and demonstrate its application to gene expression data obtained from patients with stomach cancer. Data of 60,483 genes from 334 patients with stomach cancer in The Cancer Genome Atlas were evaluated by principal component analysis, heatmaps, and the convolutional neural network (CNN) algorithm. We combined the RNA-seq gene expression data with clinical data, searched candidate genes, and analyzed them using the CNN deep learning algorithm. We performed learning using the sample type and vital status of patients with stomach cancer and verified the results. We obtained an accuracy of 95.96% for sample type and 50.51% for vital status. Despite overfitting owing to the limited number of patients, relatively accurate results for sample type were obtained. This approach can be used to predict the prognosis of stomach cancer, which has many types and underlying causes.
Ensemble data mining methods, also known as classifier combination, are often used to improve the performance of classification. Various classifier combination methods such as bagging, boosting, and random forest have been devised and have received considerable attention in the past. However, data dimensionality increases rapidly day by day. Such a trend poses various challenges as these methods are not suitable to directly apply to high-dimensional datasets. In this paper, we propose an ensemble method for classification of high-dimensional data, with each classifier constructed from a different set of features determined by partitioning of redundant features. In our method, the redundancy of features is considered to divide the original feature space. Then, each generated feature subset is trained by a support vector machine, and the results of each classifier are combined by majority voting. The efficiency and effectiveness of our method are demonstrated through comparisons with other ensemble techniques, and the results show that our method outperforms other methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.