Murali Emani scite author profile

System designers typically use well-studied benchmarks to evaluate and improve new architectures and compilers. We design tomorrow's systems based on yesterday's applications. In this paper we investigate an emerging application, 3D scene understanding, likely to be signi cant in the mobile space in the near future. Until now, this application could only run in real-time on desktop GPUs. In this work, we examine how it can be mapped to power constrained embedded systems. Key to our approach is the idea of incremental co-design exploration, where optimization choices that concern the domain layer are incrementally explored together with low-level compiler and architecture choices. The goal of this exploration is to reduce execution time while minimizing power and meeting our quality of result objective. As the design space is too large to exhaustively evaluate, we use active learning based on a random forest predictor to nd good designs. We show that our approach can, for the rst time, achieve dense 3D mapping and tracking in the real-time range within a 1W power budget on a popular embedded device. This is a 4.8x execution time improvement and a 2.8x power reduction compared to the state-of-the-art

show abstract

Smart, adaptive mapping of parallelism in the presence of external workload

Emani

Wang

O’Boyle

2013

View full text Add to dashboard Cite

show abstract

GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics

Zvyagin

Brace

Hippe

et al. 2022

Preprint

View full text Add to dashboard Cite

Our work seeks to transform how new and emergent variants of pandemic causing viruses, specially SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pre-training on over 110 million prokaryotic gene sequences, and then finetuning a SARS-CoV-2 specific model on 1.5 million genomes, we show that GenSLM can accurately and rapidly identify variants of concern. Thus, to our knowledge, GenSLM represents one of the first whole genome scale foundation models which can generalize to other prediction tasks. We demonstrate the scaling of GenSLMs on both GPU-based supercomputers and AI-hardware accelerators, achieving over 1.54 zettaflops in training runs. We present initial scientific insights gleaned from examining GenSLMs in tracking the evolutionary dynamics of SARS-CoV-2, noting that its full potential on large biological data is yet to be realized.

show abstract

EReinit: Scalable and efficient fault‐tolerance for bulk‐synchronous MPI applications

Chakraborty

Laguna

Emani

et al. 2018

Concurrency and Computation

View full text Add to dashboard Cite

Scientists from many different fields have been developing Bulk-Synchronous MPI applications to simulate and study a wide variety of scientific phenomena. Since failure rates are expected to increase in larger scale future HPC systems, providing efficient fault-tolerance mechanisms for this class of applications is paramount.The global-restart model has been proposed to decrease the time of failure recovery in Bulk-Synchronous applications by allowing a fast reinitialization of MPI. However, the current implementations of this model have several drawbacks: they lack efficiency; their scalability have not been shown; they require the use of the MPI profiling interface, which precludes the use of tools. In this paper, we present ERE-INIT, an implementation of the global-restart model that addresses these problems.Our key idea and optimization is the co-design of basic fault-tolerance mechanisms, such as failure detection, notification, and recovery, between MPI and the resource manager, in contrast to current approaches on which these mechanisms are implemented in MPI only. We demonstrate EREINIT in three HPC programs and show that it is up to four times more efficient than existing solutions at 4,096 processes.

show abstract

Bootstrapping Parameter Space Exploration for Fast Tuning

Thiagarajan

Jain

Giménez

et al. 2018

View full text Add to dashboard Cite

Accelerating Scientific Applications With SambaNova Reconfigurable Dataflow Architecture

et al. 2021

View full text Add to dashboard Cite

Neural Architecture Search for Transformers: A Survey

et al. 2022

View full text Add to dashboard Cite

Transformer-based Deep Neural Network architectures have gained tremendous interest due to their effectiveness in various applications across Natural Language Processing (NLP) and Computer Vision (CV) domains. These models are the de facto choice in several language tasks, such as Sentiment Analysis and Text Summarization, replacing Long Short Term Memory (LSTM) model. Vision Transformers (ViTs) have shown better model performance than traditional Convolutional Neural Networks (CNNs) in vision applications while requiring significantly fewer parameters and training time. The design pipeline of a neural architecture for a given task and dataset is extremely challenging as it requires expertise in several interdisciplinary areas such as signal processing, image processing, optimization and allied fields. Neural Architecture Search (NAS) is a promising technique to automate the architectural design process of a Neural Network in a data-driven way using Machine Learning (ML) methods. The search method explores several architectures without requiring significant human effort, and the searched models outperform the manually built networks. In this paper, we review Neural Architecture Search techniques, targeting the Transformer model and its family of architectures such as Bidirectional Encoder Representations from Transformers (BERT) and Vision Transformers. We provide an in-depth literature review of approximately 50 state-of-theart Neural Architecture Search methods and explore future directions in this fast-evolving class of problems.

show abstract

Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments

Emani

O’Boyle

2015

View full text Add to dashboard Cite

Matching program parallelism to platform parallelism using thread selection is difficult when the environment and available resources dynamically change. Existing compiler or runtime approaches are typically based on a one-size fits all policy. There is little ability to either evaluate or adapt the policy when encountering new external workloads or hardware resources. This paper focuses on selecting the best number of threads for a parallel application in dynamic environments. It develops a new scheme based on a mixture of experts approach. It learns online which, of a number of existing policies, or experts, is best suited for a particular environment without having to try out each policy. It does this by using a novel environment predictor as a proxy for the quality of an expert thread selection policy. Additional expert policies can easily be added and are selected only when appropriate. We evaluate our scheme in environments with varying external workloads and hardware resources. We then consider the case when workloads use affinity scheduling or are themselves adaptive and show that our approach, in all cases, outperforms existing schemes and surprisingly improves workload performance. On average, we improve 1.66x over OpenMP default, 1.34x over an online scheme, 1.25x over an offline policy and 1.2x over a state-of-art analytic model. Determining the right number and type of experts is an open problem and our initial analysis shows that adding more experts improves accuracy and performance.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Murali Emani

Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding

Smart, adaptive mapping of parallelism in the presence of external workload

GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics

EReinit: Scalable and efficient fault‐tolerance for bulk‐synchronous MPI applications

Bootstrapping Parameter Space Exploration for Fast Tuning

Accelerating Scientific Applications With SambaNova Reconfigurable Dataflow Architecture

Neural Architecture Search for Transformers: A Survey

Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments

Contact Info

Product

Resources

About