2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA) 2022
DOI: 10.1109/hpca53966.2022.00032
|View full text |Cite
|
Sign up to set email alerts
|

Near-Stream Computing: General and Transparent Near-Cache Acceleration

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 74 publications
0
2
0
Order By: Relevance
“…The second use case that was used to further corroborate the functionality of the proposed NDPmulator framework was the NDAcc presented by Wang et al [47], [48]. Their architecture consists of Processing Element (PE) arrays installed close to the cache to perform arithmetic and logic vector operations, as depicted in Fig.…”
Section: B Near-stream Computingmentioning
confidence: 98%
See 2 more Smart Citations
“…The second use case that was used to further corroborate the functionality of the proposed NDPmulator framework was the NDAcc presented by Wang et al [47], [48]. Their architecture consists of Processing Element (PE) arrays installed close to the cache to perform arithmetic and logic vector operations, as depicted in Fig.…”
Section: B Near-stream Computingmentioning
confidence: 98%
“…Section III explains the simulation flow of NDPmulator for both SE and FS modes, providing examples of its operation. Section IV briefly describes the NDAccs proposed by Das and Kapoor [46], Wang et al [47], [48], and Genc et al [49] whose architectures were used to validate NDPmulator, and presents as discusses the obtained experimental results. Section V summarizes relevant related work.…”
Section: All In All This Paper Presents the Following Contributionsmentioning
confidence: 99%
See 1 more Smart Citation
“…A dataflow fabric can easily run callbacks in parallel by assigning each a unique tag. Alternatively, täkō could execute callbacks on reserved SMT threads [141,151], but this would either sequentialize callbacks or require multiple, heavy-weight thread contexts. Moreover, constantly re-fetching and decoding the same instructions would be wasteful.…”
Section: Engine Microarchitecturementioning
confidence: 99%
“…TaskStream [15] extends the ISA for task parallelism, enabling dynamic reordering of tasks to exploit opportunities for multicasting data shared between tasks. Prior work also adds stream abstractions to CPU ISAs [57][58][59]; the "stream confluence" optimization [59] enables recognizing simultaneous reuse across multiple cores and combines streams dynamically to reduce requests to shared cache and reduce traffic by multicasting. Overall, the realization of Mozart lends credence to the practicality of adopting these ideas in industry.…”
Section: Related Workmentioning
confidence: 99%