Nastaran Hajinazar scite author profile

Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex operations. In this paper, we propose SIMDRAM, a flexible general-purpose processing-using-DRAM framework that (1) enables the efficient implementation of complex operations, and (2) provides a flexible mechanism to support the implementation of arbitrary user-defined operations. The SIMDRAM framework comprises three key steps. The first step builds an efficient MAJ/NOT representation of a given desired operation. The second step allocates DRAM rows that are reserved for computation to the operation's input and output operands, and generates the required sequence of DRAM commands to perform the MAJ/NOT implementation of the desired operation in DRAM. The third step uses the SIMDRAM control unit located inside the memory controller to manage the computation of the operation from start to end, by executing the DRAM commands generated in the second step of the framework. We design the hardware and ISA support for SIMDRAM framework to (1) address key system integration challenges, and (2) allow programmers to employ new SIMDRAM operations without hardware changes.We evaluate SIMDRAM for reliability, area overhead, throughput, and energy efficiency using a wide range of operations and seven real-world applications to demonstrate SIMDRAM's generality. Our evaluations using a single DRAM bank show that (1) over 16 operations, SIMDRAM provides 2.0× the throughput and 2.6× the energy efficiency of Ambit, a state-of-the-art processing-using-DRAM mechanism; (2) over seven real-world applications, SIM-DRAM provides 2.5× the performance of Ambit. Using 16 DRAM banks, SIMDRAM provides (1) 88× and 5.8× the throughput, and 257× and 31× the energy efficiency, of a CPU and a high-end GPU, respectively, over 16 operations; (2) 21× and 2.1× the performance of the CPU and GPU, over seven real-world applications. SIMDRAM incurs an area overhead of only 0.2% in a high-end CPU.

show abstract

Demystifying Complex Workload-DRAM Interactions

Ghose

Hajinazar

et al. 2019

View full text Add to dashboard Cite

A Case for Richer Cross-Layer Abstractions: Bridging the Semantic Gap with Expressive Memory

Vijaykumar

Jain

Majumdar

et al. 2018

View full text Add to dashboard Cite

The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework

Hajinazar

Patel

et al. 2020

View full text Add to dashboard Cite

Casper: Accelerating Stencil Computation using Near-cache Processing

Denzler¹,

Bera²,

Hajinazar³

et al. 2021

Preprint

View full text Add to dashboard Cite

Stencil computation is one of the most used kernels in a wide variety of scientific applications, ranging from large-scale weather prediction to solving partial differential equations. Stencil computations are characterized by three unique properties: (1) low arithmetic intensity, (2) limited temporal data reuse, and (3) regular and predictable data access pattern. As a result, stencil computations are typically bandwidth-bound workloads, which only experience limited benefits from the deep cache hierarchy of modern CPUs.In this work, we propose Casper, a near-cache accelerator consisting of specialized stencil compute units connected to the lastlevel cache (LLC) of a traditional CPU. Casper is based on two key ideas: (1) avoiding the cost of moving rarely reused data through the cache hierarchy, and (2) exploiting the regularity of the data accesses and the inherent parallelism of the stencil computation to increase the overall performance. With minimal changes in LLC address decoding logic and data placement, Casper performs stencil computations at the peak bandwidth of the LLC. We show that, by tightly coupling lightweight stencil compute units near to LLC, Casper improves performance of stencil kernels by 1.65× on average, while reducing the energy consumption by 35% compared to a commercial high-performance multi-core processor. Moreover, Casper provides a 37× improvement in performance-per-area compared to a state-of-the-art GPU.

show abstract

AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes

Kim¹,

Fırtına²,

Banu³

et al. 2019

Preprint

View full text Add to dashboard Cite

As genome sequencing tools and techniques improve, researchers are able to incrementally assemble more accurate reference genomes. A more accurate reference genome enables increased accuracy in read mappings, which provides more accurate variant information and thus health data on the donor. Therefore, read data sets from sequenced samples should ideally be mapped to the latest available reference genome. Unfortunately, the increasingly large amounts of available genomic data makes it prohibitively expensive to fully map each read data set to its respective reference genome every time the reference is updated. Several tools that attempt to reduce the procedure of updating a read data set from one reference to another (i.e., remapping) have been published. These tools identify regions of similarity across the two references and update the mapping locations of a read based on the locations of similar regions in the new reference genome. The main drawback of existing approaches is that if a read maps to a region in the old reference without similar regions in the new reference, it cannot be remapped. We find that, as a result of this drawback, a significant portion of annotations are lost when using state-of-the-art remapping tools. To address this major limitation in existing tools, we propose AirLift, a fast and comprehensive technique for moving alignments from one genome to another. AirLift can reduce 1) the number of reads that need to be mapped from the entire read set by up to 99.9% and 2) the overall execution time to remap the reads between the two most recent reference versions by 6.94×, 44.0×, and 16.4× for large (human), medium (C. elegans), and small (yeast) references, respectively.

show abstract

AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes

Kim

Fırtına

Banu

et al. 2021

Preprint

View full text Add to dashboard Cite

As genome sequencing tools and techniques improve, researchers are able to incrementally assemble more accurate reference genomes, which enable sensitivity in read mapping and downstream analysis such as variant calling. A more sensitive downstream analysis is critical for a better understanding of the genome donor (e.g., health characteristics). Therefore, read sets from sequenced samples should ideally be mapped to the latest available reference genome that represents the most relevant population. Unfortunately, the increasingly large amount of available genomic data makes it prohibitively expensive to fully re-map each read set to its respective reference genome every time the reference is updated. There are several tools that attempt to accelerate the process of updating a read data set from one reference to another (i.e., remapping) by 1) identifying regions that appear similarly between two references and 2) updating the mapping location of reads that map to any of the identified regions in the old reference to the corresponding similar region in the new reference. The main drawback of existing approaches is that if a read maps to a region in the old reference that does not appear with a reasonable degree of similarity in the new reference, the read cannot be remapped. We find that, as a result of this drawback, a significant portion of annotations (i.e., coding regions in a genome) are lost when using state-of-the-art remapping tools. To address this major limitation in existing tools, we propose AirLift, a fast and comprehensive technique for remapping alignments from one genome to another. Compared to the state-of-the-art method for remapping reads (i.e., full mapping), AirLift reduces 1) the number of reads (out of the entire read set) that need to be fully mapped to the new reference by up to 99.99\% and 2) the overall execution time to remap read sets between two reference genome versions by 6.7x, 6.6x, and 2.8x for large (human), medium (C. elegans), and small (yeast) reference genomes, respectively. We validate our remapping results with GATK and find that AirLift provides similar accuracy in identifying ground truth SNP and INDEL variants as the baseline of fully mapping a read set.

show abstract

CoNDA

Boroumand

Zheng

Mutlu

et al. 2019

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.