Biocomputing 2022 2021
DOI: 10.1142/9789811250477_0029
|View full text |Cite
|
Sign up to set email alerts
|

A Method for Localizing Non-Reference Sequences to the Human Genome

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2

Relationship

4
1

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…In order to better understand patterns of contamination in human whole genome sequencing, we analyzed sequences from the iHART dataset[37]. Originally curated to study genetic determinants of autism, the iHART dataset contains whole genome sequences from blood samples from children with autism, their siblings, and their parents, but also stands as an invaluable genomics resource due to its unique family structure [38, 39, 40]. iHART was sequenced at the New York Genome Sequencing Center, a common site for large sequencing studies, using commonly followed storage, prep, and sequencing protocols [37], making it a good model dataset to understand common sequencing issues.…”
Section: Introductionmentioning
confidence: 99%
“…In order to better understand patterns of contamination in human whole genome sequencing, we analyzed sequences from the iHART dataset[37]. Originally curated to study genetic determinants of autism, the iHART dataset contains whole genome sequences from blood samples from children with autism, their siblings, and their parents, but also stands as an invaluable genomics resource due to its unique family structure [38, 39, 40]. iHART was sequenced at the New York Genome Sequencing Center, a common site for large sequencing studies, using commonly followed storage, prep, and sequencing protocols [37], making it a good model dataset to understand common sequencing issues.…”
Section: Introductionmentioning
confidence: 99%
“…We previous developed and validated a proof-of-concept algorithm to localize 100-bp k -mers extracted from 150 bp reads [7]. We review the mathematics of this maximum likelihood model, and discuss the modifications that we added in order to allow localization of tandem repeats and for sequence originating from the sex chromosomes.…”
Section: Methodsmentioning
confidence: 99%
“…Originally, we aimed to develop an algorithm to localize unmapped reads to coarse regions of the genome, with the hope that these regions could function as “bins” and that from these bins, we could ultimately assemble longer contigs that represented alternative haplotypes or sections of the genome missing from GRCh38. We presented a proof-of-concept version of such a localization algorithm for non-repetitive sequences in autosomes, with discussion about a final de novo step would look like [7].…”
Section: Introductionmentioning
confidence: 99%
“…This means that population-specific rare variants, haplotypes and structural variants cannot be captured well for certain populations. This can have implications for the development of individualized therapies based on those markers (Chrisman et al, 2022). The recent Human Pangenome Reference Consortium (HPRC) is working to establish a human genome reference that reflects the existing worldwide human diversity (https://humanpangenome.org/).…”
Section: Resources For Including Diverse and Admixed Populationsmentioning
confidence: 99%