<i>FastEtch</i>: A Fast Sketch-Based Assembler for Genomes

Ghosh, Priyanka; Kalyanaraman, Ananth

doi:10.1109/tcbb.2017.2737999

Cited by 8 publications

(5 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Count-Min Sketching is also utilized by the genome histosketching method, where k-mer spectra are represented by Count-Min sketches and the frequency estimates are utilized to populate a histosketch [4]. The Count-Min sketch has also been used for de Bruijn graph approximation during de novo genome assembly; reducing the runtime and memory overheads associated with construction of the full graph and the subsequent pruning of low-quality edges [33].…”

Section: Sketching Algorithms and Implementationsmentioning

confidence: 99%

When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data

Rowe

2019

Genome Biol

View full text Add to dashboard Cite

Considerable advances in genomics over the past decade have resulted in vast amounts of data being generated and deposited in global archives. The growth of these archives exceeds our ability to process their content, leading to significant analysis bottlenecks. Sketching algorithms produce small, approximate summaries of data and have shown great utility in tackling this flood of genomic data, while using minimal compute resources. This article reviews the current state of the field, focusing on how the algorithms work and how genomicists can utilize them effectively. References to interactive workbooks for explaining concepts and demonstrating workflows are included at https://github.com/will-rowe/genome-sketching.

show abstract

Section: Sketching Algorithms and Implementationsmentioning

confidence: 99%

When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data

Rowe

2019

Genome Biol

View full text Add to dashboard Cite

show abstract

“…Consequently, we also performed a head-to-head comparison of PaKman running p MPI processes versus the shared memory tools running p threads on the same machine (Keeran). We compared against two state-of-the-art shared memory tools, namely IDBA-UD [1] and FastEtch [14]. Fig.…”

Section: A Comparative Evaluation 1) Evaluation On Distributed Memormentioning

confidence: 99%

PaKman: Scalable Assembly of Large Genomes on Distributed Memory Machines

Ghosh

Krishnamoorthy

Kalyanaraman

2019

Preprint

Self Cite

View full text Add to dashboard Cite

De novo genome assembly is a fundamental problem in the field of bioinformatics, that aims to assemble the DNA sequence of an unknown genome from numerous short DNA fragments (aka reads) obtained from it. With the advent of highthroughput sequencing technologies, billions of reads can be generated in a matter of hours, necessitating efficient parallelization of the assembly process. While multiple parallel solutions have been proposed in the past, conducting a large-scale assembly at scale remains a challenging problem because of the inherent complexities associated with data movement, and irregular access footprints of memory and I/O operations. In this paper, we present a novel algorithm, called PaKman, to address the problem of performing large-scale genome assemblies on a distributed memory parallel computer. Our approach focuses on improving performance through a combination of novel data structures and algorithmic strategies for reducing the communication and I/O footprint during the assembly process. PaKman presents a solution for the two most time-consuming phases in the full genome assembly pipeline, namely, k-mer counting and contig generation.A key aspect of our algorithm is its graph data structure, which comprises fat nodes (or what we call "macro-nodes") that reduce the communication burden during contig generation. We present an extensive performance and qualitative evaluation of our algorithm, including comparisons to other state-of-the-art parallel assemblers. Our results demonstrate the ability to achieve near-linear speedups on up to 8K cores (tested); outperform state-of-the-art distributed memory and shared memory tools in performance while delivering comparable (if not better) quality; and reduce time to solution significantly. For instance, PaKman is able to generate a high-quality set of assembled contigs for complex genomes such as the human and wheat genomes in a matter of minutes on 8K cores.

show abstract

“…Organic molecules found in bread and other meals are eaten by mold, a fungus. Three frequent bread molds are Penicillium, Cladosporium, and black bread mold [21]. For testing, a widespread belief is that bread can be tasted, smelled, and touched.…”

Section: Aspergillusmentioning

confidence: 99%

IoT Based Detection of Molded Bread and Expiry Prediction using Machine Learning Techniques

Akhtar¹,

Feng²

2018

EAI Endorsed Transactions on Creative Technologies

View full text Add to dashboard Cite

Expiration of a bread is a very popular issue in food logistics. Due to various conditions fungal bread can cause food poisoning for consumers. As a result, nausea, diarrhea and different medical issues appear in people. For this purpose, an intelligent system required for the detection of present condition of bread is required which will help the stores and consumers. In this study, we have developed a prototype made up of Arduino Nano as a microcontroller, MQ series sensors for CO and CO2 detection in shopper bags of bread in order to collect data. This data is further processed in different machine learning algorithms for the detection of current condition of bread in these stores. The data collected from these sensors was imbalanced. Data collected from sensors is then balanced by using SMOTE and TOMEC Links (data balancing techniques). Furthermore, data preprocessing and feature engineering has been applied on IoT Based dataset to improve its efficiency. We have applied linear learning models for the prediction of current condition of bread. Within linear models, Gaussian Naïve Bayes has scored highest accuracy of 81.54%.

show abstract

FastEtch: A Fast Sketch-Based Assembler for Genomes

Cited by 8 publications

References 38 publications

When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data

When the levee breaks: a practical guide to sketching algorithms for processing the flood of genomic data

PaKman: Scalable Assembly of Large Genomes on Distributed Memory Machines

IoT Based Detection of Molded Bread and Expiry Prediction using Machine Learning Techniques

Contact Info

Product

Resources

About