2021
DOI: 10.1101/2021.01.15.426838
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Investigating the impact of reference assembly choice on genomic analyses in a cattle breed

Abstract: BackgroundReference-guided read alignment and variant genotyping are prone to reference allele bias, particularly for samples that are greatly divergent from the reference genome. A Hereford-based assembly is the widely accepted bovine reference genome. Haplotype-resolved genomes that exceed the current bovine reference genome in quality and continuity have been assembled for different breeds of cattle. Using whole genome sequencing data of 161 Brown Swiss cattle, we compared the accuracy of read mapping and s… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 69 publications
0
7
0
Order By: Relevance
“…More non-reference sequence was identified in the lower-quality Angus-backed pangenome (+13%), while the more complete Simmental-backed pangenomes had less (−6%). Reference-bias propagates through minigraph’s pangenomes, such as the missing sequence in Angus chromosome 28 (Lloret-Villas et al, 2021) resulting in 25% fewer bubbles compared to using ARS-UCD1.2 (Figure 4d).…”
Section: Resultsmentioning
confidence: 99%
“…More non-reference sequence was identified in the lower-quality Angus-backed pangenome (+13%), while the more complete Simmental-backed pangenomes had less (−6%). Reference-bias propagates through minigraph’s pangenomes, such as the missing sequence in Angus chromosome 28 (Lloret-Villas et al, 2021) resulting in 25% fewer bubbles compared to using ARS-UCD1.2 (Figure 4d).…”
Section: Resultsmentioning
confidence: 99%
“…Multi-sample variant calling was performed with the GATK HaplotypeCaller, GenomicsDBImport and GenotypeGVCFs modules according to the best practice guidelines (39,40). We applied the VariantFiltration module for site-level filtration with thresholds indicated in (30) to retain high-quality SNP and INDELs.…”
Section: Comparison Of Variant Callersmentioning
confidence: 99%
“…Reads were split per read groups with gdc-fastq-splitter (27) (version 1.0.) and subsequently aligned with bwa-mem2 (28) using the -M and -R flags to a manually curated version of the current bovine Hereford-based reference genome (ARS-UCD1.2) (29) that included a Y chromosome as described in (30). Samblaster (31) (version 0.1.26), Sambamba (32), samtools (33,34) (version 1.12), and Picard tools (35) (version 2.25.7) were used to deduplicate and sort the BAM files.…”
Section: Alignment Mapping Quality and Depth Of Coveragementioning
confidence: 99%
“…While this shift has merits based on the fact that many population genomics analyses produce more robust results with more samples at low coverage than with fewer samples at high coverage (Alex Buerkle & Gompert, 2013; Lou et al, 2021) remind us that low‐coverage whole‐genome sequencing is more sensitive to artefacts due to DNA degradation, depth heterogeneity or DNA quality. That being said, some artefacts such as reference bias and alignment errors are equally problematic with high‐coverage data (Gage et al, 2019; Lloret‐Villas et al, 2021), and more importantly, Lou and Therkildsen (2022) show that appropriate bioinformatic procedures are key to control and correct for the impact of multiple factors. Most of those mitigation methods include more stringent filtering that may reduce the fraction of genome actually analysed or the number of polymorphic markers.…”
Section: Figurementioning
confidence: 99%