2021
DOI: 10.1101/2021.01.08.425845
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Novel functional sequences uncovered through a bovine multi-assembly graph

Abstract: Linear reference genomes are typically assembled from single individuals. They are unable to reflect the genetic diversity of populations and lack millions of bases. To overcome such limitations and make non-reference sequences amenable to genetic investigations, we build a multi-assembly graph from six reference-quality assemblies from taurine cattle and their close relatives. We uncover 70,329,827 bases that are missing in the bovine linear reference genome. The missing sequences encode novel transcripts tha… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 72 publications
1
3
0
Order By: Relevance
“…By aligning the five cattle assemblies, we illustrate that a substantial portion of the cattle pan-genome is likely missing from the Hereford reference. The amount of non-reference sequence identified by our approach broadly matches that from another study using a different but overlapping set of genomes and graph assembly approach 33 . This has important implications for cattle research as it suggests significant amounts of the bovine genome is inaccessible in most current analyses.…”
Section: Discussionsupporting
confidence: 68%
“…By aligning the five cattle assemblies, we illustrate that a substantial portion of the cattle pan-genome is likely missing from the Hereford reference. The amount of non-reference sequence identified by our approach broadly matches that from another study using a different but overlapping set of genomes and graph assembly approach 33 . This has important implications for cattle research as it suggests significant amounts of the bovine genome is inaccessible in most current analyses.…”
Section: Discussionsupporting
confidence: 68%
“…However, there are currently few high-quality graph genomes available. In livestock, the use of graph genomes has so far been restricted to studies simply incorporating variants from short read sequencing data into the Hereford reference 16,17 or to only very large differences between the assemblies themselves 18 . Although not able to capture wider cattle diversity, these studies illustrated that the variant calls using the graph genome were more consistent between sire-son pairs than those obtained using the linear Hereford reference, with the current standard variant calling algorithms GATK HaplotypeCaller 19 and FreeBayes 20 .…”
Section: Introductionmentioning
confidence: 99%
“…Data supporting this study, including the multiassembly graph, nonreference sequences, nonreference genes, transcript abundances, and sequence variants detected from nonreference sequences are available via Zenodo (https://zenodo.org/record/4385983#. YHQwER8zbIU) (74).…”
mentioning
confidence: 99%