Near-DRAM Acceleration with Single-ISA Heterogeneous Processing in Standard Memory Modules

Asghari-Moghaddam, Hadi; Farmahini-Farahani, Amin; Morrow, Katherine; Ahn, Jung Ho; Kim, Nam Sung

doi:10.1109/mm.2016.8

Cited by 17 publications

(6 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their approach focuses on an efficient instruction offloading technique and their framework introduces vault-level parallelism to improve computation throughput. In Reference [2] authors utilize a number of lightweight cores in conjunction with commodity two-dimensional (2D) DRAMs to explore a general-purpose NDP designs. They manage to implement an NDP execution framework that utilizes the same ISA with the host processor, and, thus, it is fully compatible with existing commercial processors.…”

Section: Related Workmentioning

confidence: 99%

“…We extend the RISC-V ISA to include the necessary functionalities to support the NDP paradigm. Processor ISA extension for NDP is also considered in Reference [2] where authors argue that such an approach provides compatibility with existing processing platforms. To this end, we implement jump-and-link-PIM (JalPim), an instruction that behaves as the original jump-andlink (Jal) instruction, and, thus, it triggers a function call.…”

Section: Host System Architecturementioning

confidence: 99%

“…According to the NDP premise, hardware accelerators are deployed on the DRAM die [38], which are used for accelerating code execution, while the energy costs of data transfer operations are reduced due to their spatial proximity with the DRAM [5]. NDP approaches focus on either application specific designs, such as References [41,46], or on general-purpose code execution techniques that are targeted for high-performance computer architectures, as in References [2,13,21]. Despite the fact that such designs achieve significant performance improvements, they fail to address the needs of low-power general-purpose computing that poses power and energy limitations on both architectural and circuit levels [15].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Low-power Near-data Instruction Execution Leveraging Opcode-based Timing Analysis

Tziouvaras

Dimitriou

Stamoulis

2022

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Traditional processor architectures utilize an external DRAM for data storage, while they also operate under worst-case timing constraints. Such designs are heavily constrained by the delay costs of the data transfer between the core pipeline and the DRAM, and they are incapable of exploiting the timing variations of their pipeline stages. In this work, we focus on a near-data processing methodology combined with a novel timing analysis technique that enables the adaptive frequency scaling of the core clock and boosts the performance of low-power designs. We propose a near-data processing and better-than-worst-case co-design methodology to efficiently move the instruction execution to the DRAM side and, at the same time, to allow the pipeline to operate at higher clock frequencies compared to the worst-case approach. To this end, we develop a timing analysis technique, which evaluates the timing requirements of individual instructions and we dynamically scale the clock frequency, according to the instructions types that currently occupy the pipeline. We evaluate the proposed methodology on six different RISC-V post-layout implementations using an HMC DRAM to enable the processing-in-memory (PIM) process. Results indicate an average speedup factor of 1.96× with a 1.6× reduction in energy consumption compared to a standard RISC-V PIM baseline implementation.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Host System Architecturementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Low-power Near-data Instruction Execution Leveraging Opcode-based Timing Analysis

Tziouvaras

Dimitriou

Stamoulis

2022

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

show abstract

“…In other related work [17] authors incorporate heterogeneous reconfigurable logic arrays, which behave like CGRAs in order to improve throughput and reduce the power consumption of target applications. CGRA capabilities are also explored in [18] along with different TSV interconnection networks in order to find the optimal CGRA-TSV combination that leads to the higher speedup improvement. A common target application of CGRAs and NDP is the training and inference of deep neural networks as previous works in [19] and [20] demonstrate.…”

Section: Related Workmentioning

confidence: 99%

“…On the contrary, our work does not require any profiling operation prior to code execution due to the fact that the CGRA is designed for loop acceleration and thus, it can support any issued loop without additional effort. Further, authors in [8] [16] [17] and [18] utilize CGRAs in conjunction with the NDP paradigm but their focus shifts to different aspects of the NDP execution paradigm. Under this premise, previous works lack the application mapping approach or the loop acceleration focus we employ as they do not utilize the CGRA network to execute instructions in an iterative way, i.e.…”

Section: Related Workmentioning

confidence: 99%

Design space exploration in near-data co-processors for general-purpose acceleration, in high-performance and low-power processing environments

Tziouvaras¹,

Τζιουβάρας²

View full text Add to dashboard Cite

Οι σύγχρονες αρχιτεκτονικές υπολογιστών είναι αντιμέτωπες με ένα σοβαρό πρόβλημα που αφορά την κλιμάκωση της απόδοσης τους, καθώς η συμφόρηση της πληροφορίας έχει μετατοπιστεί από τον πυρήνα του επεξεργαστή στην μονάδα της κύριας μνήμης και στις λειτουργίες μεταφοράς δεδομένων. Το φαινόμενο αυτό μπορεί μερικώς να αποδοθεί στο τέλος της ισχύος του νόμου του Dennard και στην διαρκή μείωση του μεγέθους των τρανσίστορς. Ως αποτέλεσμα, η πυκνότητα ισχύος των ολοκληρωμένων κυκλωμάτων έχει αυξηθεί τόσο, ώστε η λειτουργία των πολύ-πυρηνικών επεξεργαστών να επιτελείται σε τάσεις που βρίσκονται κοντά στην τάση κατωφλίου. Για να ξεπεράσουν το πρόβλημα αυτό, οι ερευνητές τείνουν να αποκλίνουν από τις κλασικές αρχιτεκτονικές προσεγγίσεις τύπου Von Neuman και να στρέφουν την προσοχή τους σε νέα μοντέλα επεξεργασίας. Την τελευταία δεκαετία έχει παρατηρηθεί μία αναζωπύρωση του ενδιαφέροντος για το παράδειγμα εκτέλεσης εντολών κοντά στην κύρια μνήμη (NDP), κατά το οποίο οι εντολές εκτελούνται στο κύκλωμα της κύριας μνήμης αντί του κεντρικού επεξεργαστή. Έτσι, ο αριθμός των λειτουργιών της μεταφοράς δεδομένων μεταξύ της κύριας μνήμης και του επεξεργαστή μειώνεται σημαντικά, κάτι το οποίο επιδρά θετικά στην κατανάλωση ισχύος και την επιτεύξιμη απόδοση του συστήματος. Κινούμενοι προς αυτήν την υπόθεση, στην διατριβή αυτή εξερευνούμε το NDP παράδειγμα για επεξεργαστές υψηλής απόδοσης αλλά και για επεξεργαστές χαμηλούς ισχύος. Όσον αφορά του επεξεργαστές υψηλής απόδοσης, προτείνουμε μία προσέγγιση στην οποία λαμβάνουμε υπ’ όψη μας την εκτέλεση βρόγχων γενικού σκοπού. H αρχιτεκτονική την οποία προτείνουμε κάνει χρήση μίας μεθοδολογίας χρονοδρομολόγησης εντολών, κατά την οποία η κάθε εντολή του βρόγχου εκδίδεται σε ένα ειδικά προσαρμοσμένο ολοκληρωμένο κύκλωμα που έχει τον ρόλο του επιταχυντή της εκτέλεσης του βρόγχου. Το κύκλωμα αυτό τοποθετείται στο λογικό επίπεδο μίας κύριας μνήμης υβριδικού κύβου (HMC). Στο επίπεδο αυτό οι εντολές εκτελούνται επαναληπτικά και παράλληλα, με έναν τρόπο που θυμίζει αυτόν της επικάλυψης λογισμικού, ενώ τα ενδιάμεσα παραγόμενα αποτελέσματα παροχετεύονται δια μέσου ενός δικτύου διασύνδεσης που βρίσκεται πάνω στο ολοκληρωμένο κύκλωμα. Όσον αφορά τις αρχιτεκτονικές χαμηλής κατανάλωσης ισχύος, αναπτύσσουμε μία καινοτόμο μεθοδολογία ανάλυσης χρονισμού, η οποία βασίζεται στις αρχές του STA και προσανατολίζεται συγκεκριμένα προς συστήματα χαμηλών προδιαγραφών και χαμηλής κατανάλωσης ενέργειας. Η μεθοδολογία αυτή λαμβάνει υπ’ όψη της την διέγερση των διαδρομών χρονισμού της κάθε εντολής που υποστηρίζεται από το σετ εντολών του επεξεργαστή (ISA) και υπολογίζει την καθυστέρηση της χειρότερης περίπτωσης για την κάθε εντολή ξεχωριστά. Ως αποτέλεσμα, αντλούμε πληροφορίες για την χρονική καθυστέρηση σε επίπεδο εντολής και εκμεταλλευόμαστε την πληροφορία αυτή ώστε να κλιμακώνουμε την συχνότητα του ρολογιού δυναμικά, ανάλογα με τον τύπο εντολής που εκτελείται στο κύκλωμα σε κάθε χρονική στιγμή. Στην συνέχεια χρησιμοποιούμε την μεθοδολογία που περιγράψαμε για να συν-σχεδιάσουμε μία αρχιτεκτονική, με γνώμονα την δυναμική μεταβολή της συχνότητας του ρολογιού του επεξεργαστή η οποία εκτείνεται στον βαθμό λεπτομέρειας του κύκλου μηχανής. Επικεντρωνόμαστε ξανά στην εκτέλεση κώδικα γενικού σκοπού και υλοποιούμε συνδυαστικά τη αρχιτεκτονική στο λογικό επίπεδο μίας μνήμης τύπου HMC ώστε να καταστήσουμε ικανό το σύστημα μας για εκτέλεση εντολών δίπλα στην μνήμη τυχαίας προσπέλασης. Επιλέγουμε να αξιολογήσουμε τις αρχιτεκτονικές που υλοποιήσαμε (της υψηλής απόδοσης αλλά και της χαμηλής κατανάλωσης ισχύος) σε επίπεδο υλοποίησης ολοκληρωμένου κυκλώματος σύμφωνα με τα πρότυπα της βιομηχανίας ώστε να ενισχύσουμε την εγκυρότητας της μεθοδολογίας μας. Τα αποτελέσματα τα οποία παίρνουμε υποδεικνύουνε μία μεγάλη αύξηση της απόδοσης του συστήματος όσον αφορά την επιτάχυνση της λειτουργίας του σε σύγκριση με την αρχική αρχιτεκτονική, ενώ η κατανάλωση ισχύος πέφτει σε πολύ χαμηλά επίπεδα.

show abstract

Near-Memory/In-Memory Computing: Pillars and Ladders

Mohamed

2020

Neuromorphic Computing and Beyond

View full text Add to dashboard Cite

Near-DRAM Acceleration with Single-ISA Heterogeneous Processing in Standard Memory Modules

Cited by 17 publications

References 12 publications

Low-power Near-data Instruction Execution Leveraging Opcode-based Timing Analysis

Low-power Near-data Instruction Execution Leveraging Opcode-based Timing Analysis

Design space exploration in near-data co-processors for general-purpose acceleration, in high-performance and low-power processing environments

Near-Memory/In-Memory Computing: Pillars and Ladders

Contact Info

Product

Resources

About