During the last 3 years, a number of approaches for the normalization of RNA sequencing data have emerged in the literature, differing both in the type of bias adjustment and in the statistical strategy adopted. However, as data continue to accumulate, there has been no clear consensus on the appropriate normalization method to be used or the impact of a chosen method on the downstream analysis. In this work, we focus on a comprehensive comparison of seven recently proposed normalization methods for the differential analysis of RNA-seq data, with an emphasis on the use of varied real and simulated datasets involving different species and experimental designs to represent data characteristics commonly observed in practice. Based on this comparison study, we propose practical recommendations on the appropriate normalization method to be used and its impact on the differential analysis of RNA-seq data.
Highlights d Molecular classes of PitNETs are identified by integrated pangenomic analyses d PitNETs molecular classification mainly reflects pituitary lineage, driven by PIT1 d Gonadotroph signatures are found in some corticotroph and somatotroph PitNETs d USP8-mutated corticotroph PitNETs correspond to a group with limited aggressiveness
BackgroundThe recent settlement of cattle in West Africa after several waves of migration from remote centres of domestication has imposed dramatic changes in their environmental conditions, in particular through exposure to new pathogens. West African cattle populations thus represent an appealing model to unravel the genome response to adaptation to tropical conditions. The purpose of this study was to identify footprints of adaptive selection at the whole genome level in a newly collected data set comprising 36,320 SNPs genotyped in 9 West African cattle populations.ResultsAfter a detailed analysis of population structure, we performed a scan for SNP differentiation via a previously proposed Bayesian procedure including extensions to improve the detection of loci under selection. Based on these results we identified 53 genomic regions and 42 strong candidate genes. Their physiological functions were mainly related to immune response (MHC region which was found under strong balancing selection, CD79A, CXCR4, DLK1, RFX3, SEMA4A, TICAM1 and TRIM21), nervous system (NEUROD6, OLFM2, MAGI1, SEMA4A and HTR4) and skin and hair properties (EDNRB, TRSP1 and KRTAP8-1).ConclusionThe main possible underlying selective pressures may be related to climatic conditions but also to the host response to pathogens such as Trypanosoma(sp). Overall, these results might open the way towards the identification of important variants involved in adaptation to tropical conditions and in particular to resistance to tropical infectious diseases.
Dairy cattle breeds have been subjected over the last fifty years to intense artificial selection towards improvement of milk production traits. In this study, we performed a whole genome scan for differentiation using 42,486 SNPs in the three major French dairy cattle breeds (Holstein, Normande and Montbéliarde) to identify the main physiological pathways and regions which were affected by this selection. After analyzing the population structure, we estimated FST within and across the three breeds for each SNP under a pure drift model. We further considered two different strategies to evaluate the effect of selection at the genome level. First, smoothing FST values over each chromosome with a local variable bandwidth kernel estimator allowed identifying 13 highly significant regions subjected to strong and/or recent positive selection. Some of them contained genes within which causal variants with strong effect on milk production traits (GHR) or coloration (MC1R) have already been reported. To go further in the interpretation of the observed signatures of selection we subsequently concentrated on the annotation of differentiated genes defined according to the FST value of SNPs localized close or within them. To that end we performed a comprehensive network analysis which suggested a central role of somatotropic and gonadotropic axes in the response to selection. Altogether, these observations shed light on the antagonism, at the genome level, between milk production and reproduction traits in highly producing dairy cows.
Several functions were used to model the fixed part of the lactation curve and genetic parameters of milk test-day records to estimate using French Holstein data. Parametric curves (Legendre polynomials, Ali-Schaeffer curve, Wilmink curve), fixed classes curves (5-d classes), and regression splines were tested. The latter were appealing because they adjusted the data well, were relatively insensitive to outliers, were flexible, and resulted in smooth curves without requiring the estimation of a large number of parameters. Genetic parameters were estimated with an Average Information REML algorithm where the average information matrix and the first derivatives of the likelihood functions were pooled over 10 samples. This approach made it possible to handle larger data sets. The residual variance was modeled as a quadratic function of days in milk. Quartic Legendre polynomials were used to estimate (co)variances of random effects. The estimates were within the range of most other studies. The greatest genetic variance was in the middle of the lactation while residual and permanent environmental variances mostly decreased during the lactation. The resulting heritability ranged from 0.15 to 0.40. The genetic correlation between the extreme parts of the lactation was 0.35 but genetic correlations were higher than 0.90 for a large part of the lactation. The use of the pooling approach resulted in smaller standard errors for the genetic parameters when compared to those obtained with a single sample.
Motivation: RNA sequencing is now widely performed to study differential expression among experimental conditions. As tests are performed on a large number of genes, stringent false-discovery rate control is required at the expense of detection power. Ad hoc filtering techniques are regularly used to moderate this correction by removing genes with low signal, with little attention paid to their impact on downstream analyses.Results: We propose a data-driven method based on the Jaccard similarity index to calculate a filtering threshold for replicated RNA sequencing data. In comparisons with alternative data filters regularly used in practice, we demonstrate the effectiveness of our proposed method to correctly filter lowly expressed genes, leading to increased detection power for moderately to highly expressed genes. Interestingly, this data-driven threshold varies among experiments, highlighting the interest of the method proposed here.Availability: The proposed filtering method is implemented in the package available on Bioconductor.Contact: andrea.rau@jouy.inra.frSupplementary information: Supplementary data are available at Bioinformatics online.
An R package metaMA is available on the CRAN.
BackgroundHigh-throughput sequencing is now regularly used for studies of the transcriptome (RNA-seq), particularly for comparisons among experimental conditions. For the time being, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential expression. As their cost continues to decrease, it is likely that additional follow-up studies will be conducted to re-address the same biological question.ResultsWe demonstrate how p-value combination techniques previously used for microarray meta-analyses can be used for the differential analysis of RNA-seq data from multiple related studies. These techniques are compared to a negative binomial generalized linear model (GLM) including a fixed study effect on simulated data and real data on human melanoma cell lines. The GLM with fixed study effect performed well for low inter-study variation and small numbers of studies, but was outperformed by the meta-analysis methods for moderate to large inter-study variability and larger numbers of studies.ConclusionsThe p-value combination techniques illustrated here are a valuable tool to perform differential meta-analyses of RNA-seq data by appropriately accounting for biological and technical variability within studies as well as additional study-specific effects. An R package is available on the CRAN (http://cran.r-project.org/web/packages/metaRNASeq).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.