2018
DOI: 10.1111/1755-0998.12779
|View full text |Cite
|
Sign up to set email alerts
|

Batch effects in a multiyear sequencing study: False biological trends due to changes in read lengths

Abstract: High-throughput sequencing is a powerful tool, but suffers biases and errors that must be accounted for to prevent false biological conclusions. Such errors include batch effects; technical errors only present in subsets of data due to procedural changes within a study. If overlooked and multiple batches of data are combined, spurious biological signals can arise, particularly if batches of data are correlated with biological variables. Batch effects can be minimized through randomization of sample groups acro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
47
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(48 citation statements)
references
References 62 publications
0
47
0
Order By: Relevance
“…Differences in sequencing strategies (e.g. read length) and bioinformatic processing have been shown to generate batch effects and dramatically affect downstream analyses (Leek et al ., 2010; Leigh et al ., 2018; Shafer et al ., 2016; Mafessoni et al ., 2018). Another well known bias in population genetics is ascertainment bias which arises when the studied variants were ascertained in selected populations only, and can substantially impact measurements of heterozygosity and related methods (Albrechtsen et al ., 2010).…”
Section: Discussionmentioning
confidence: 99%
“…Differences in sequencing strategies (e.g. read length) and bioinformatic processing have been shown to generate batch effects and dramatically affect downstream analyses (Leek et al ., 2010; Leigh et al ., 2018; Shafer et al ., 2016; Mafessoni et al ., 2018). Another well known bias in population genetics is ascertainment bias which arises when the studied variants were ascertained in selected populations only, and can substantially impact measurements of heterozygosity and related methods (Albrechtsen et al ., 2010).…”
Section: Discussionmentioning
confidence: 99%
“…In particular, low variation in chromosome size, small sample size, low heritability, a skewed effect size distribution and clustering of loci can lead to both loss of power and bias and consequently altered inference from chromosome‐partitioning analyses. Another important cautionary tale is presented by Leigh, Lischer, Grossen, and Keller (), concerning a false signal of selection and environmental association which arose as a consequence of batch effects in the sequencing of individuals sampled from a number of populations of Alpine ibex ( Capra ibex ). They show that the incremental addition of populations to the study meant that many populations were sequenced at different times and using different sequencing technologies, with corresponding changes to read length.…”
Section: Summary Of Special Issue “Wild Gwas”mentioning
confidence: 99%
“…Leigh et al. (), however, outline a number of bioinformatic approaches to test and account for batch effects, particularly when randomization of sample groups across batches may not be possible. Given the incremental addition of samples for long‐term wild studies in particular, careful consideration of the potential for batch effects is important in validating signals of association in natural populations.…”
Section: Summary Of Special Issue “Wild Gwas”mentioning
confidence: 99%
See 1 more Smart Citation
“…We compared the amount of parsimony informative sites and missing data in samples with their age. Because we sequenced samples over multiple years, batch effects, or biases attributable to differences among sequencing run could also bias our results (Leigh et al 2018). To provide a qualitative assessment of batch effects, we provide plots of trees where tips have been colored according to one of three plates that they were sequenced on.…”
Section: Summarizing Phylogenetic Signal In Modern and Historical Sammentioning
confidence: 99%