2022
DOI: 10.1101/2022.03.29.486262
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Boquila: NGS read simulator to eliminate read nucleotide bias in sequence analysis

Abstract: Sequence content is heterogeneous throughout genomes. Therefore, Genome-wide NGS reads biased towards specific nucleotide profiles are affected by the genome-wide heterogeneous nucleotide distribution. Boquila generates sequences that mimic the nucleotide profile of true reads, which can be used to correct the nucleotide-based bias of genome-wide distribution of NGS reads. Boquila can be configured to generate reads from only specified regions of the reference genome. It also allows the use of input DNA sequen… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 33 publications
0
2
0
Order By: Relevance
“…We first aimed to analyze the distribution of the simulated reads only, which reflects the nucleotide content bias of the genome in the 3D organization. Our simulation tool, Boquila, randomly selects genomic regions from the reference genome or input DNA sequencing data in a way that selected pseudo-reads will have a similar nucleotide frequency to the given NGS dataset ( 22 ). It takes two inputs: (i) reference genome or preexisting sequencing read data and (ii) actual NGS data (XR-seq or Damage-seq in this study).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We first aimed to analyze the distribution of the simulated reads only, which reflects the nucleotide content bias of the genome in the 3D organization. Our simulation tool, Boquila, randomly selects genomic regions from the reference genome or input DNA sequencing data in a way that selected pseudo-reads will have a similar nucleotide frequency to the given NGS dataset ( 22 ). It takes two inputs: (i) reference genome or preexisting sequencing read data and (ii) actual NGS data (XR-seq or Damage-seq in this study).…”
Section: Resultsmentioning
confidence: 99%
“…Simulated datasets are generated using boquila ( 22 ) (v0.6). For HeLa cells, we used input DNA sequencing data accessed from SRA SRA PRJNA608124.…”
Section: Methodsmentioning
confidence: 99%