Alberto Ferrer scite author profile

Next-generation sequencing (NGS) technologies are revolutionizing genome research, and in particular, their application to transcriptomics (RNA-seq) is increasingly being used for gene expression profiling as a replacement for microarrays. However, the properties of RNA-seq data have not been yet fully established, and additional research is needed for understanding how these data respond to differential expression analysis. In this work, we set out to gain insights into the characteristics of RNA-seq data analysis by studying an important parameter of this technology: the sequencing depth. We have analyzed how sequencing depth affects the detection of transcripts and their identification as differentially expressed, looking at aspects such as transcript biotype, length, expression level, and fold-change. We have evaluated different algorithms available for the analysis of RNA-seq and proposed a novel approach-NOISeq-that differs from existing methods in that it is data-adaptive and nonparametric. Our results reveal that most existing methodologies suffer from a strong dependency on sequencing depth for their differential expression calls and that this results in a considerable number of false positives that increases as the number of reads grows. In contrast, our proposed method models the noise distribution from the actual data, can therefore better adapt to the size of the data set, and is more effective in controlling the rate of false discoveries. This work discusses the true potential of RNA-seq for studying regulation at low expression ranges, the noise within RNA-seq data, and the issue of replication.

show abstract

Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package

Tarazona

et al. 2015

View full text Add to dashboard Cite

As the use of RNA-seq has popularized, there is an increasing consciousness of the importance of experimental design, bias removal, accurate quantification and control of false positives for proper data analysis. We introduce the NOISeq R-package for quality control and analysis of count data. We show how the available diagnostic tools can be used to monitor quality issues, make pre-processing decisions and improve analysis. We demonstrate that the non-parametric NOISeqBIO efficiently controls false discoveries in experiments with biological replication and outperforms state-of-the-art methods. NOISeq is a comprehensive resource that meets current needs for robust data-aware analysis of RNA-seq differential expression.

show abstract

maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments

et al. 2006

View full text Add to dashboard Cite

In this work, we propose a statistical procedure to identify genes that show different gene expression profiles across analytical groups in time-course experiments. The method is a two-regression step approach where the experimental groups are identified by dummy variables. The procedure first adjusts a global regression model with all the defined variables to identify differentially expressed genes, and in second a variable selection strategy is applied to study differences between groups and to find statistically significant different profiles. The methodology is illustrated on both a real and a simulated microarray dataset.

show abstract

Dealing with missing data in MSPC: several methods, different interpretations, some examples

Arteaga¹,

Ferrer

2002

Journal of Chemometrics

204

195

View full text Add to dashboard Cite

This paper addresses the problem of using future multivariate observations with missing data to estimate latent variable scores from an existing principal component analysis (PCA) model. This is a critical issue in multivariate statistical process control (MSPC) schemes where the process is continuously interrogated based on an underlying PCA model. We present several methods for estimating the scores of new individuals with missing data: a so-called trimmed score method (TRI), a single-component projection method (SCP), a method of projection to the model plane (PMP), a method based on the iterative imputation of missing data, a method based on the minimization of the squared prediction error (SPE), a conditional mean replacement method (CMR) and various least squared-based methods: one based on a regression on known data (KDR) and the other based on a regression on trimmed scores (TSR). The basis for each method and the expressions for the score estimators, their covariance matrices and the estimation errors are developed. Some of the methods discussed have already been proposed in the literature (SCP, PMP and CMR), some are original (TRI and TSR) and others are shown to be equivalent to methods already developed by other authors: iterative imputation and SPE methods are equivalent to PMP; KDR is equivalent to CMR. These methods can be seen as different ways to impute values for the missing variables. The efficiency of the methods is studied through simulations based on an industrial data set. The KDR method is shown to be statistically superior to the other methods, except the TSR method in which the matrix to be inverted is of a much smaller size.

show abstract

Multivariate image analysis: A review with applications

Prats-Montalbán

Juan

Ferrer

2011

Chemometrics and Intelligent Laboratory Systems

266

157

View full text Add to dashboard Cite

NOIseq: a RNA-seq differential expression method robust for sequencing depth biases

Tarazona¹,

Garcı́a²,

Ferrer³

et al. 2012

EMBnet j.

141

107

View full text Add to dashboard Cite

Discovering gene expression patterns in time course microarray experiments by ANOVA–SCA

Nueda

Conesa

Westerhuis

et al. 2007

View full text Add to dashboard Cite

show abstract

PCA model building with missing data: New proposals and a comparative study

Folch-Fortuny

Arteaga

Ferrer

2015

Chemometrics and Intelligent Laboratory Systems

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alberto Ferrer

Differential expression in RNA-seq: A matter of depth

Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package

maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments

Dealing with missing data in MSPC: several methods, different interpretations, some examples

Multivariate image analysis: A review with applications

NOIseq: a RNA-seq differential expression method robust for sequencing depth biases

Discovering gene expression patterns in time course microarray experiments by ANOVA–SCA

PCA model building with missing data: New proposals and a comparative study

Contact Info

Product

Resources

About