2018
DOI: 10.1101/457101
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Personalized and graph genomes reveal missing signal in epigenomic data

Abstract: Background: Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesized that using a generic reference could lead to incorrectly mapped reads and bias downstream results.Results: We show that accounting for genetic variation using a modified reference genome (MPG) or a denovo assembled genome (DPG) can alter histone H3K4me1 and H3K27ac ChIP-s… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
10
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 43 publications
1
10
0
Order By: Relevance
“…We constructed bovine variation-aware reference graphs using a Hereford-based linear reference sequence as backbone and variants that were filtered for allele frequency in four cattle breeds. Using both simulated and real short read data, our findings corroborate that a variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping [22,23,[32][33][34].…”
Section: Discussionsupporting
confidence: 65%
“…We constructed bovine variation-aware reference graphs using a Hereford-based linear reference sequence as backbone and variants that were filtered for allele frequency in four cattle breeds. Using both simulated and real short read data, our findings corroborate that a variation-aware reference facilitates accurate read mapping and unbiased sequence variant genotyping [22,23,[32][33][34].…”
Section: Discussionsupporting
confidence: 65%
“…We have leveraged the high positional stability of DHS summits across cell types and states (median 38bp across >700 cell contexts) to define and annotate archetypal DHS-encoding elements within the human genome, and to encompass these under a common coordinate system and nomenclature. DHS identifiers are robust to genome builds, and transferable to personal genomes and emerging graph-based approaches to genome analysis 30,31 . They also enable straightforward incorporation of cell-selectivity properties (e.g., regulatory components) and structural/functional features such as DNase footprints.…”
Section: Discussionmentioning
confidence: 99%
“…Together these features create a powerful new framework for analyses at the intersection of gene regulation and the genetics of human diseases and quantitative traits. Archetypal DHS identifiers are robust to genome builds, transferable to personal genomes and emerging graph-based genome analysis 44,45 , and enable facile incorporation of functional properties such as cell-selectivity, or finer structural annotations such as DNase I footprints. Common reference coordinates will further greatly facilitate comparisons between large experimental data sets, and between human and mouse DHSs.…”
Section: Discussionmentioning
confidence: 99%
“…But linearity leads to reference bias : a tendency to miss alignments or report incorrect alignments for reads containing non-reference alleles. This can ultimately lead to confounding of scientific results, especially for analyses concerned with hypervariable regions [ 2 ], allele-specific effects [ 3 6 ], ancient DNA analysis [ 7 , 8 ], or epigenenomic signals [ 9 ]. These problems can be more or less adverse depending on the individual under study, e.g., African-ancestry genomes contain more ALT alleles, and so can be more severely affected by reference bias [ 10 ].…”
Section: Introductionmentioning
confidence: 99%