2016
DOI: 10.1093/bioinformatics/btw038
|View full text |Cite
|
Sign up to set email alerts
|

ParDRe: faster parallel duplicated reads removal tool for sequencing studies

Abstract: jgonzalezd@udc.es.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
42
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 38 publications
(43 citation statements)
references
References 6 publications
1
42
0
Order By: Relevance
“…Because we observed high rates of likely PCR duplicates among the reads for most samples, the raw reads were de-duplicated using ParDRe (Gonzalez-Dominguez and Schmidt, 2016), allowing one mismatch and using an 18 bp prefix. Testing on internal controls using the ERCC spike-in mix showed that de-duplication improved the correlation of transcript abundances with known relative values (data not shown).…”
Section: Methodsmentioning
confidence: 99%
“…Because we observed high rates of likely PCR duplicates among the reads for most samples, the raw reads were de-duplicated using ParDRe (Gonzalez-Dominguez and Schmidt, 2016), allowing one mismatch and using an 18 bp prefix. Testing on internal controls using the ERCC spike-in mix showed that de-duplication improved the correlation of transcript abundances with known relative values (data not shown).…”
Section: Methodsmentioning
confidence: 99%
“…(ii) For the removal of duplicate, many strategies are being followed. One of such tools, named ParDRe [5], employs clustering technique for eliminating duplicated reads. It forms groups of reads having similar prefix of length l. Among each group, in parallel, reads are examined for similar suffixes to identify duplicates.…”
Section: Fig 1 Flow Of Ngs Data Processing and Analysismentioning
confidence: 99%
“…This is because the SA is a sorted array having all possible permutations of left rotated sequence of strings. For example, to find the positions where the substring "GA" occurs in the string "GACGAT", the SA interval is found as [4,5]. The SA values with respect to the interval [4,5] are 3 and 0 which are the positions where the substring "GA" is found in the given string.…”
Section: Fig 2 Alignment Of Short Reads Against Reference Genomementioning
confidence: 99%
“…Our tool also provides all this functionality (and even more) but in a significantly lower runtime by fully exploiting the parallel processing capabilities of Spark. Although there are a few parallel tools to remove duplicate DNA/RNA sequences (one specific operation that can be used for quality control) on distributed-memory systems [13], [14], up to our knowledge, SeQual is the first publicly available tool intended for this type of parallel systems that provides full functionality (more than 30 operations) instead of only allowing to remove duplicate reads. Furthermore, SeQual includes a graphical user interface intended for simplifying its usage.…”
Section: Introductionmentioning
confidence: 99%