Plasticine: A Reconfigurable Accelerator for Parallel Patterns

Prabhakar, Raghu; Zhang, Yaqi; Koeplinger, David; Feldman, Matt; Zhao, Tian; Hadjis, Stefan; Pedram, Ardavan; Kozyrakis, Christos; Olukotun, Kunle

doi:10.1109/mm.2018.032271058

Cited by 66 publications

(106 citation statements)

References 12 publications

Supporting

Mentioning

104

Contrasting

Unclassified

Order By: Relevance

“…Specialized hardware can accelerate polystore systems by leveraging multicore CPUs, graphics processing unit (GPUs [28]), field programmable gate arrays (FPGAs [29]), application-specific integrated chips (ASICs [30]), or coarse grain reconfigurable arrays (CGRAs [31]) such as Plasticine [32]. Multicore CPUs consume more power per task with limited parallelism and inefficient data movement compared to well-matched applications running on accelerators.…”

Section: B Hardware Acceleratorsmentioning

confidence: 99%

“…Each accelerator is programmed using its hardware-specific low-level language (e.g., Verilog), which requires a developer to have deep understanding of the underlying hardware. Application and hardware domain-specific languages (DSLs), such as Spatial [38], Relay [39], and Delite [40], ease application development for specific accelerators by abstracting low-level abstractions into high-level primitives (e.g., parallel patterns [32]) that a developer is familiar with. For example, Halide [41] and Tensorflow [42] are application-specific DSLs for image processing and deep neural network processing respectively.…”

Section: B Hardware Acceleratorsmentioning

confidence: 99%

See 1 more Smart Citation

Polystore++: Accelerated Polystore System for Heterogeneous Workloads

Singhal

Zhang

Nardi

et al. 2019

2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)

Self Cite

View full text Add to dashboard Cite

Modern real-time business analytic consist of heterogeneous workloads (e.g., database queries, graph processing, and machine learning). These analytic applications need programming environments that can capture all aspects of the constituent workloads (including data models they work on and movement of data across processing engines). Polystore systems suit such applications; however, these systems currently execute on CPUs and the slowdown of Moore's Law means they cannot meet the performance and efficiency requirements of modern workloads. We envision Polystore++, an architecture to accelerate existing polystore systems using hardware accelerators (e.g., FPGAs, CGRAs, and GPUs). Polystore++ systems can achieve high performance at low power by identifying and offloading components of a polystore system that are amenable to acceleration using specialized hardware. Building a Polystore++ system is challenging and introduces new research problems motivated by the use of hardware accelerators (e.g., optimizing and mapping query plans across heterogeneous computing units and exploiting hardware pipelining and parallelism to improve performance). In this paper, we discuss these challenges in detail and list possible approaches to address these problems.

show abstract

Section: B Hardware Acceleratorsmentioning

confidence: 99%

Section: B Hardware Acceleratorsmentioning

confidence: 99%

Polystore++: Accelerated Polystore System for Heterogeneous Workloads

Singhal

Zhang

Nardi

et al. 2019

2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Designs range from architectures with simple general-purpose cores with (software) configurable interconnect, such as MIT's RAW [40], to more proper CGRAs that tightly integrate a general-purpose core with an array of functional units (typically identical arithmetic-logic units), such as GARP [8], Piperench [22], ADRES [27], Tartan [31], and DySER [23]. The Plasticine [39] spatially reconfigurable design combines pattern compute units (PCUs), hierarchically composed of a reconfigurable pipeline with multiple stages of SIMD functional units, and pattern memory units (PMUs), simplifying mapping of inner loops and feedback edges to the hardware and enabling execution of applications expressed as parallel patterns. Ongoing efforts in Path Forward projects and the DARPA Electronic Resurgence Initiative (ERI) demonstrate the intense interest around CGRAs.…”

Section: Related Workmentioning

confidence: 99%

Software defined architectures for data analytics

Castellana

Minutoli

Tumeo

et al. 2019

Proceedings of the 24th Asia and South Pacific Design Automation Conference

View full text Add to dashboard Cite

Data analytics applications increasingly are complex workflows composed of phases with very different program behaviors (e.g., graph algorithms and machine learning, algorithms operating on sparse and dense data structures, etc). To reach the levels of efficiency required to process these workflows in real time, upcoming architectures will need to leverage even more workload specialization. If, at one end, we may find even more heterogenous processors composed by a myriad of specialized processing elements, at the other end we may see novel reconfigurable architectures, composed of sets of functional units and memories interconnected with (re)configurable on-chip networks, able to adapt dynamically to adapt the workload characteristics. Field Programmable Gate Arrays are more and more used for accelerating various workloads and, in particular, inferencing in machine learning, providing higher efficiency than other solutions. However, their fine-grained nature still leads to issues for the design software and still makes dynamic reconfiguration impractical. Future, more coarse-grained architectures could offer the features to execute diverse workloads at high efficiency while providing better reconfiguration mechanisms for dynamic adaptability. Nevertheless, we argue that the challenges for reconfigurable computing remain in the software. In this position paper, we describe a possible toolchain for reconfigurable architectures targeted at data analytics.

show abstract

“…One solution is to design a reconfigurable architecture at word level, the Coarse‐Grained Reconfigurable Architectures (CGRAs). However, there is no commercial CGRA available in the market, and the CGRAs also suffer from long compilation times …”

Section: Related Workmentioning

confidence: 99%

“…However, there is no commercial CGRA available in the market, and the CGRAs also suffer from long compilation times. 19 One way is to focus on application and domain-specific accelerators: Neural network, 2,20 Bayesian learning, 21 bioinformatics, 22 stencil computing, 16,23 energy-efficient accelerators for graph analytics algorithms, 24 and irregular applications mapping. 25 Another way is to focus on Domain-Specific Language (DSL) which aims representing parallelism in stream-based applications, like SPar based on C++.…”

mentioning

confidence: 99%

ADD: Accelerator Design and Deploy ‐ A tool for FPGA high‐performance dataflow computing

Penha

Silva

et al. 2018

Concurrency and Computation

View full text Add to dashboard Cite

Summary Dataflow‐based FPGA accelerators have become a promising alternative to deliver energy‐efficient high‐performance computing. However, FPGA programming is still a challenge. This paper presents Accelerator Design and Deploy (ADD), a high‐level framework to specify, to simulate, and to implement dataflow accelerators for streaming applications. The framework includes an open dataflow operator library, and templates are provided to easily design new operators. The framework also provides a high‐level and an accurate simulation at circuit level with short execution times. Moreover, ADD provides software and hardware APIs to simplify the integration process, extending the benefits of portability from low‐cost FPGA boards to high performance datacenter FPGA platforms. Our framework supports coupling with high‐level programming languages, and it has been validated on two FPGA platforms: the Intel high‐performance CPU‐FPGA heterogeneous computing platform and an educational FPGA kit. We show that our simple approach presents competitive performance, both in time and energy, when compared to multi‐core and GPU accelerators.

show abstract

Plasticine: A Reconfigurable Accelerator for Parallel Patterns

Cited by 66 publications

References 12 publications

Polystore++: Accelerated Polystore System for Heterogeneous Workloads

Polystore++: Accelerated Polystore System for Heterogeneous Workloads

Software defined architectures for data analytics

ADD: Accelerator Design and Deploy ‐ A tool for FPGA high‐performance dataflow computing

Contact Info

Product

Resources

About