Lingqi Zhang scite author profile

(1) Background: DNA sequence alignment process is an essential step in genome analysis. BWA-MEM has been a prevalent single-node tool in genome alignment because of its high speed and accuracy. The exponentially generated genome data requiring a multi-node solution to handle large volumes of data currently remains a challenge. Spark is a ubiquitous big data platform that has been exploited to assist genome alignment in handling this challenge. Nonetheless, existing works that utilize Spark to optimize BWA-MEM suffer from higher overhead. (2) Methods: In this paper, we presented PipeMEM, a framework to accelerate BWA-MEM with lower overhead with the help of the pipe operation in Spark. We additionally proposed to use a pipeline structure and in-memory-computation to accelerate PipeMEM. (3) Results: Our experiments showed that, on paired-end alignment tasks, our framework had low overhead. In a multi-node environment, our framework, on average, was 2.27× faster compared with BWASpark (an alignment tool in Genome Analysis Toolkit (GATK)), and 2.33× faster compared with SparkBWA. (4) Conclusions: PipeMEM could accelerate BWA-MEM in the Spark environment with high performance and low overhead.

show abstract

Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA

Wahib

Zhang

Nguyen

et al. 2020

View full text Add to dashboard Cite

Asymptotic Method for Analysis of Nonlinear Systems With Two Parameters

Xu¹,

Zhang²

1986

Acta Mathematica Scientia

View full text Add to dashboard Cite

Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA

Wahib

Zhang

Nguyen

et al. 2020

Preprint

View full text Add to dashboard Cite

The dedicated memory of hardware accelerators can be insufficient to store all weights and/or intermediate states of large deep learning models. Although model parallelism is a viable approach to reduce the memory pressure issue, significant modification of the source code and considerations for algorithms are required. An alternative solution is to use out-of-core methods instead of, or in addition to, data parallelism.We propose a performance model based on the concurrency analysis of out-of-core training behavior, and derive a strategy that combines layer swapping and redundant recomputing. We achieve an average of 1.52x speedup in six different models over the state-of-the-art out-of-core methods. We also introduce the first method to solve the challenging problem of out-of-core multi-node training by carefully pipelining gradient exchanges and performing the parameter updates on the host. Our data parallel out-of-core solution can outperform complex hybrid model parallelism in training large models, e.g. Megatron-LM and Turning-NLG.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lingqi Zhang

High accuracy digital image correlation powered by GPU-based parallel computing

Qualitative Behavior and Nonoscillation of Stommel's Thermohaline Box Model

The mythical thermohaline oscillator?

Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?

PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead

Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA

Asymptotic Method for Analysis of Nonlinear Systems With Two Parameters

Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA

Contact Info

Product

Resources

About