2021
DOI: 10.1101/2021.05.26.445798
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The complete sequence of a human genome

Abstract: In 2001, Celera Genomics and the International Human Genome Sequencing Consortium published their initial drafts of the human genome, which revolutionized the field of genomics. While these drafts and the updates that followed effectively covered the euchromatic fraction of the genome, the heterochromatin and many other complex regions were left unfinished or erroneous. Addressing this remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium has finished the first truly complete 3.055 billion base… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

7
165
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
2

Relationship

3
5

Authors

Journals

citations
Cited by 138 publications
(173 citation statements)
references
References 88 publications
(65 reference statements)
7
165
0
1
Order By: Relevance
“…In addition, there were systematic anomalies in the SV calls in highly repetitive regions such as the centromere and satellite repeats and an overall excess of variants that are found in all samples. There has recently been work to improve the reference genome to more accurately reflect these regions (Nurk et al 2021) , and as tools for aligning to and calling variants in these regions continue to mature, we expect the accuracy of these calls to even further improve. Finally, while we have detected a large number of SVs in the 31 samples we studied, there is still much to be discovered.…”
Section: Discussionmentioning
confidence: 99%
“…In addition, there were systematic anomalies in the SV calls in highly repetitive regions such as the centromere and satellite repeats and an overall excess of variants that are found in all samples. There has recently been work to improve the reference genome to more accurately reflect these regions (Nurk et al 2021) , and as tools for aligning to and calling variants in these regions continue to mature, we expect the accuracy of these calls to even further improve. Finally, while we have detected a large number of SVs in the 31 samples we studied, there is still much to be discovered.…”
Section: Discussionmentioning
confidence: 99%
“…To test DiMeLo-seq's ability to measure protein occupancy in heterochromatic, repetitive regions of the genome we targeted H3K9me3, which is abundant in pericentric heterochromatin. We chose to target H3K9me3 in HG002 cells because the chromosome X centromere has been completely assembled for this male-derived lymphoblast line (Nurk et al 2021), and many different sequencing data types are available for it (Gershman et al 2021). To validate the specificity of targeted methylation, we calculated the fraction of adenines methylated within HG002 CUT&RUN H3K9me3 peaks (Altemose et al 2021) compared to the fraction of adenines methylated outside of broadly defined peaks (Methods).…”
Section: Mapping Histone Modifications In Heterochromatin With Dimelo-seqmentioning
confidence: 99%
“…Modern draft eukaryotic genome assembly graphs are typically built from a subset of four Whole Genome Shotgun (WGS) sequencing data types: Illumina short reads 7,8 , Oxford Nanopore Technologies (ONT) long reads 9 , PacBio Continuous Long Reads (CLR), and PacBio High-Fidelity (HiFi) long reads 9,10 , all of which have been extensively described [7][8][9][10] . However, we note that even the high-accuracy technologies produce sequencing data with some noise caused by platform-specific technical biases that require careful validation and polishing 11,12,1,10,13 .…”
Section: Introductionmentioning
confidence: 98%
“…Genome assembly is a foundational practice of quantitative biological research with increasing utility. By representing the genomic sequence of a sample of interest, genome assemblies enable researchers to annotate important features, quantify functional data, and discover/genotype genetic variants in a population [1][2][3][4][5][6] . Modern draft eukaryotic genome assembly graphs are typically built from a subset of four Whole Genome Shotgun (WGS) sequencing data types: Illumina short reads 7,8 , Oxford Nanopore Technologies (ONT) long reads 9 , PacBio Continuous Long Reads (CLR), and PacBio High-Fidelity (HiFi) long reads 9,10 , all of which have been extensively described [7][8][9][10] .…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation