2016
DOI: 10.1093/bioinformatics/btw602
|View full text |Cite
|
Sign up to set email alerts
|

LongISLND: in silico sequencing of lengthy and noisy datatypes

Abstract: Summary: LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling.Availability… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 18 publications
(13 citation statements)
references
References 17 publications
(27 reference statements)
0
13
0
Order By: Relevance
“…To validate our simulation results we generated read profiles for Nanosim [6], PBSim [7], LONGIslnd [8], and the two simulators we have proposed. Nanopore sequencing experiments typically yield between 10,000 and 20,000 reads [6], with measured statistics being approximately identical at even 20% of this size as shown in Figure 1(C).…”
Section: Simulation Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…To validate our simulation results we generated read profiles for Nanosim [6], PBSim [7], LONGIslnd [8], and the two simulators we have proposed. Nanopore sequencing experiments typically yield between 10,000 and 20,000 reads [6], with measured statistics being approximately identical at even 20% of this size as shown in Figure 1(C).…”
Section: Simulation Resultsmentioning
confidence: 99%
“…• We propose an algorithm for simulating reads, a variant of the Hidden Markov Model employed by Nanosim [6], with modifications to apply observed k-mer bias. We demonstrate that it models the k-mer error distribution better than other popular read simulators [6]- [8] (section IV-C).…”
Section: Introductionmentioning
confidence: 94%
See 1 more Smart Citation
“…We built the long-reads error profile using the CHM1 dataset [29] (SRA accession SRX533609). We then simulated a 100× pure normal sample using the VarSim simulation framework [30] in combination with the LongISLND in silico long-reads sequencer [31]. Using a set of random somatic mutations, we also simulated a 100× pure tumor sample with the same error profile.…”
Section: Pacbio Datamentioning
confidence: 99%
“…More recent technologies can produce very long reads, but at the expense of having higher costs and much higher error rates [11]. However, longer reads have been found to be more appropriate or better compared to short reads in certain studies.…”
Section: Introductionmentioning
confidence: 99%