2012
DOI: 10.1145/2133382.2133388
|View full text |Cite
|
Sign up to set email alerts
|

A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification

Abstract: Applications that use learning and classification algorithms operate on large amounts of unstructured data, and have stringent performance constraints. For such applications, the performance of general purpose processors scales poorly with data size because of their limited support for fine-grained parallelism and absence of software-managed caches. The large intermediate data in these applications also limits achievable performance on many-core processors such as GPUs. To accelerate such learning applications… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
13
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 40 publications
(13 citation statements)
references
References 21 publications
0
13
0
Order By: Relevance
“…Moreover, the neurons they implement are inspired from biology, i.e., spiking neurons, they do not implement the CNNs and DNNs which are the focus of our architecture. Majumdar et al [37] investigate a parallel architecture for various machine-learning algorithms, including, but not only, neural networks; unlike our architecture, they have an off-chip banked memory, and they introduce memory banks close to PEs (similar to those found in GPUs) for caching purposes. Finally, beyond neural networks and machine-learning tasks, other largescale custom architectures have been proposed, such as the recently proposed Anton 2 [60], for molecular dynamics simulation.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, the neurons they implement are inspired from biology, i.e., spiking neurons, they do not implement the CNNs and DNNs which are the focus of our architecture. Majumdar et al [37] investigate a parallel architecture for various machine-learning algorithms, including, but not only, neural networks; unlike our architecture, they have an off-chip banked memory, and they introduce memory banks close to PEs (similar to those found in GPUs) for caching purposes. Finally, beyond neural networks and machine-learning tasks, other largescale custom architectures have been proposed, such as the recently proposed Anton 2 [60], for molecular dynamics simulation.…”
Section: Related Workmentioning
confidence: 99%
“…The prevalence and compute-intensive nature of RM applications has led to efforts to optimize them using parallel software on multi-core and many-core processors [1,2], specialized hardware accelerators [3,4,14] and custom circuits [5]. StoRM is an accelerator for RM applications, but utilizes an entirely different approach (SC), which leads to significant benefits compared to previous efforts.…”
Section: Related Workmentioning
confidence: 99%
“…As a result, realizing efficient implementations of RM workloads is a problem that has attracted great interest, with solutions proposed ranging from optimized software on multi-core and many-core processors [1,2] to specialized hardware accelerators [3,4] and custom mixed-signal circuits [5].…”
Section: Introductionmentioning
confidence: 99%
“…al. [11] described the manycore MAPLE architecture which was designed to accelerate a number of learning and classification problems, including SVMs. Vector processing elements in a two dimensional grid were used to perform linear algebra.…”
Section: Introductionmentioning
confidence: 99%
“…While both Ly [14] and Majumdar [11] targetted maximum performance in batch learning tasks, ours is designed for single-FPGA, floatingpoint embedded applications in which minimising latency and compactness are the key design goals. Similar architectures have been applied to the acceleration of linear algebra problems, utilising both spatial parallelism and pipelining to achieve high performance.…”
Section: Introductionmentioning
confidence: 99%