2017 IEEE International Conference on Big Data (Big Data) 2017
DOI: 10.1109/bigdata.2017.8258251
|View full text |Cite
|
Sign up to set email alerts
|

CloudEC: A MapReduce-based algorithm for correcting errors in next-generation sequencing big data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 37 publications
0
7
0
Order By: Relevance
“…Thanks to this change, CloudRS is able to process the sequences using multiple worker nodes, effectively allowing it to handle larger datasets than ALLPATHS-LG in less time. Finally, CloudEC [7] is another Hadoop-based MSA corrector that was presented as an enhanced version of CloudRS. The major improvement of CloudEC over its counterpart was the introduction of the spread corrector, a new MSA-based algorithm which increases the reliability of the reads at the cost of reducing its performance, as this algorithm is much more computationally intensive than the one provided by CloudRS (i.e., the pinch corrector).…”
Section: Big Data and Parallel Correctorsmentioning
confidence: 99%
See 2 more Smart Citations
“…Thanks to this change, CloudRS is able to process the sequences using multiple worker nodes, effectively allowing it to handle larger datasets than ALLPATHS-LG in less time. Finally, CloudEC [7] is another Hadoop-based MSA corrector that was presented as an enhanced version of CloudRS. The major improvement of CloudEC over its counterpart was the introduction of the spread corrector, a new MSA-based algorithm which increases the reliability of the reads at the cost of reducing its performance, as this algorithm is much more computationally intensive than the one provided by CloudRS (i.e., the pinch corrector).…”
Section: Big Data and Parallel Correctorsmentioning
confidence: 99%
“…However, most of the previous solutions usually lack either accuracy in correction, performance when processing large datasets, or the capability to scale out on a computing cluster. Among them, CloudEC [4] has been proved to perform precise corrections together with a scalable approach by relying on Big Data technologies, since its correction algorithms have been designed upon the MapReduce paradigm [5] using its most popular open-source implementation Apache Hadoop [6] (more details about Big Data and MapReduce are provided in Section 2 of Additional file 1). However, the usage of this tool comes at the cost of poor performance in terms of computational time when managing the huge amounts of data usually generated by NGS platforms.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…In fact, the exploitation of Big Data clusters to accelerate the storage, processing and visualization of large NGS datasets has been recently explored in multiple previous works. For instance, many bioinformatics tools implemented on top of Big Data processing frameworks such as Hadoop [25] and Spark [9] have emerged in recent years, from error correction [26], [27], duplicate read removal [13] and sequence alignment [28]- [31], to variant calling [32], de novo genome assembly [33], [34] and protein structure prediction [35]- [37], among many others. Most of these tools are executed within a bioinformatics pipeline (or scientific workflow engines such as SAASFEE [38] or Pegasus [39]) that usually starts with a quality control of the input FASTA/FASTQ datasets.…”
Section: Related Workmentioning
confidence: 99%
“…Hence, multiple algorithms have been proposed in the literature to correct these mistakes in the samples and make up higher quality reads. Among them, CloudEC [3] is a Big Data tool built upon the Apache Hadoop framework [4] that is able to perform corrections to genetic datasets by running multiple steps of alignments of the input samples, and replacing the bases with the lowest qualities of all those aligned samples with another representations of higher quality.…”
Section: Introductionmentioning
confidence: 99%