High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably.
Two-color DNA microarrays are commonly used for the analysis of global gene expression. They provide information on relative abundance of thousands of mRNAs. However, the generated data need to be normalized to minimize systematic variations so that biologically significant differences can be more easily identified. A large number of normalization procedures have been proposed and many softwares for microarray data analysis are available. Here, we have applied two normalization methods (median and loess) from two packages of microarray data analysis softwares. They were examined using a sample data set. We found that the number of genes identified as differentially expressed varied significantly depending on the method applied. The obtained results, i.e. lists of differentially expressed genes, were consistent only when we used median normalization methods. Loess normalization implemented in the two software packages provided less coherent and for some probes even contradictory results. In general, our results provide an additional piece of evidence that the normalization method can profoundly influence final results of DNA microarray-based analysis. The impact of the normalization method depends greatly on the algorithm employed. Consequently, the normalization procedure must be carefully considered and optimized for each individual data set.
Herein we present the applicability of single-molecule (PacBio RS) and second-generation sequencing technology (Illumina) to the characterization of large genomic deletions. By testing samples previously characterized using a Sanger approach, our methods determined that both next-generation sequencing platforms were able to identify the position of deletion breakpoints. Our results point out various advantages of next-generation sequencing platforms when characterizing genomic deletions; however, special attention must be dedicated to identical sequences flanking the breakpoints, such as poly(N) motifs.
Abstract. DNA microarrays, which are among the most popular genomic tools, are widely applied in biology and medicine. Boutique arrays, which are small, spotted, dedicated microarrays, constitute an inexpensive alternative to whole-genome screening methods. The data extracted from each microarray-based experiment must be transformed and processed prior to further analysis to eliminate any technical bias. The normalization of the data is the most crucial step of microarray data pre-processing and this process must be carefully considered as it has a profound effect on the results of the analysis. Several normalization algorithms have been developed and implemented in data analysis software packages. However, most of these methods were designed for whole-genome analysis. In this study, we tested 13 normalization strategies (ten for double-channel data and three for single-channel data) available on R Bioconductor and compared their effectiveness in the normalization of four boutique array datasets. The results revealed that boutique arrays can be successfully normalized using standard methods, but not every method is suitable for each dataset. We also suggest a universal seven-step workflow that can be applied for the selection of the optimal normalization procedure for any boutique array dataset. The described workflow enables the evaluation of the investigated normalization methods based on the bias and variance values for the control probes, a differential expression analysis and a receiver operating characteristic curve analysis. The analysis of each component results in a separate ranking of the normalization methods. A combination of the ranks obtained from all the normalization procedures facilitates the selection of the most appropriate normalization method for the studied dataset and determines which methods can be used interchangeably. IntroductionDespite the dynamic development of deep sequencing technologies, microarrays are still commonly used in genomic research (1-5). Currently, DNA microarrays are mainly used for genotyping (6-9), gene expression profiling (10-12) and microRNA screening (13-15). In medicine, microarrays are used to determine the complexity and heterogeneity of diseases, to facilitate disease classification and to predict therapeutic outcomes (8,(16)(17)(18)(19)(20)(21)(22).Microarrays provide a large amount of useful information, but are accompanied by inherent noise and systematic errors (23)(24)(25)(26). No microarray experiment is free from variation introduced during sample preparation, hybridization, washing and scanning (24,27,28). Spotted arrays are burdened with technical defects that occur during their printing; these defects manifest as differences in spot size and shape and/or shifts of spots, rows or whole print-tips (24,28). In two-color assays, additional bias is introduced by uneven dye incorporation and by differences in the signal dynamic range and the sensitivity of dyes to photobleaching (23,24,29). Therefore, the major challenge in microarray analysis is data pr...
SUMMARYCassava brown streak viruses (CBSVs) are responsible for significant cassava yield losses in eastern sub–Saharan Africa. In the present work, we inoculated CBSV–susceptible and –resistant cassava varieties with a mixed infection of CBSVs using top-cleft grafting. Virus titres in grafted scions were monitored in a time course experiment in both varieties. We performed RNA-seq of the two cassava varieties at the earliest time-point of full infection in the susceptible scions. Genes encoding proteins in RNA silencing and salicylic acid pathways were regulated in the susceptible cassava variety but transcriptional changes were limited in the resistant variety. After infection, genes related to callose deposition at plasmodesmata were regulated and callose deposition was significantly reduced in the susceptible cassava variety. We also show that β–1,3–glucanase enzymatic activity is differentially regulated in the susceptible and resistant varieties. The differences in transcriptional responses to CBSV infection indicate that resistance involves callose deposition at plasmodesmata but does not trigger typical anti-viral defence responses. A meta-analysis of the current RNA-seq dataset and selected, previously reported, host–potyvirus and virus-cassava RNA-seq datasets revealed comparable host responses across pathosystems only at similar time points after infection or infection of a common host.HIGHLIGHTOur results suggest that resistance to CBSV in cassava involves callose deposition at the plasmodesmata and our meta-analysis of multiple virus-crop RNA-seq studies suggests that conserved responses across different host-virus systems are limited and depend greatly on time after infection.
The informational content of RNA sequencing is currently far from being completely explored. Most of the analyses focus on processing tables of counts or finding isoform deconvolution via exon junctions. This article presents a comparison of several techniques that can be used to estimate differential expression of exons or small genomic regions of expression, based on their coverage function shapes. The problem is defined as finding the differentially expressed exons between two samples using local expression profile normalization and statistical measures to spot the differences between two profile shapes. Initial experiments have been done using synthetic data, and real data modified with synthetically created differential patterns. Then, 160 pipelines (5 types of generator × 4 normalizations × 8 difference measures) are compared. As a result, the best analysis pipelines are selected based on linearity of the differential expression estimation and the area under the ROC curve. These platform-independent techniques have been implemented in the Bioconductor package rnaSeqMap. They point out the exons with differential expression or internal splicing, even if the counts of reads may not show this. The areas of application include significant difference searches, splicing identification algorithms and finding suitable regions for QPCR primers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.