2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) 2020
DOI: 10.1109/isca45697.2020.00023
|View full text |Cite
|
Sign up to set email alerts
|

Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
38
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 51 publications
(38 citation statements)
references
References 39 publications
0
38
0
Order By: Relevance
“…Although plenty of other notable architectures exist (see Table II), a pattern begins to emerge, as most specialized processors rely on a series of sub-processing elements which each contribute to increasing throughput of a larger processor [81], [82]. Whilst there are plenty of ways to achieve MAC parallelism, one of the most renowned techniques is the systolic array, and is utilized by Groq [85] and Google, amongst numerous other chip developers. This is not a new concept: systolic architectures were first proposed back in the late 1970s [86], [87], and have become widely popularized since powering the hardware DeepMind used for the AlphaGo system to defeat Lee Sedol, the world champion of the board game Go in October 2015.…”
Section: ) Edge-ai Dnn Accelerators Suitable For Biomedical Applicationsmentioning
confidence: 99%
“…Although plenty of other notable architectures exist (see Table II), a pattern begins to emerge, as most specialized processors rely on a series of sub-processing elements which each contribute to increasing throughput of a larger processor [81], [82]. Whilst there are plenty of ways to achieve MAC parallelism, one of the most renowned techniques is the systolic array, and is utilized by Groq [85] and Google, amongst numerous other chip developers. This is not a new concept: systolic architectures were first proposed back in the late 1970s [86], [87], and have become widely popularized since powering the hardware DeepMind used for the AlphaGo system to defeat Lee Sedol, the world champion of the board game Go in October 2015.…”
Section: ) Edge-ai Dnn Accelerators Suitable For Biomedical Applicationsmentioning
confidence: 99%
“…On the other hand, for some applications, 16-bit floating point MAC units are necessary [51] to reduce significant development cost. State-ofthe-art CNN accelerators [51][53] [56] have both 8bit MAC units for efficient execution of CNNs and 16bit floating-point MAC units for the accurate execution.…”
Section: A Quantizationmentioning
confidence: 99%
“…In this section, we review recent trends in CNN accelerator research which are not covered in the previous sections. In the previous sections, we mentioned the trends of CNN accelerators, for example, large on-chip memories (144MB [51], 220MB [56]), 8bit fixed point MAC units and 16bit floating point MAC units for CNN accelerators, popularity of streaming architectures [41][52] [53].…”
Section: Trends In Recent Cnn Acceleratormentioning
confidence: 99%
“…To keep pace with the rapid advancement of DNN models, the computing throughput of spatial accelerators scales up to tens or hundreds of TOPS [1,3,14]. And the number of PEs in a spatial accelerator also increases rapidly at the same time.…”
Section: Spatial Dnn Acceleratorsmentioning
confidence: 99%