2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA) 2022
DOI: 10.1109/hpca53966.2022.00082
|View full text |Cite
|
Sign up to set email alerts
|

TransPIM: A Memory-based Acceleration via Software-Hardware Co-Design for Transformer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(12 citation statements)
references
References 34 publications
0
7
0
Order By: Relevance
“…We align the memory specifications of TransPIM, such as HBM timing parameters and capacity, with those used for NeuPIMs and the NPU+PIM baseline. Figure 15 reports the speedup of NeuPIMs over Tran-sPIM [89] NeuPIMs shows an average 228× higher throughput than TransPIM. The significant performance gap is attributed to the effectiveness of GEMM computation executed on the NPU in the case of NeuPIMs, as opposed to PIM in TransPIM.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We align the memory specifications of TransPIM, such as HBM timing parameters and capacity, with those used for NeuPIMs and the NPU+PIM baseline. Figure 15 reports the speedup of NeuPIMs over Tran-sPIM [89] NeuPIMs shows an average 228× higher throughput than TransPIM. The significant performance gap is attributed to the effectiveness of GEMM computation executed on the NPU in the case of NeuPIMs, as opposed to PIM in TransPIM.…”
Section: Resultsmentioning
confidence: 99%
“…PIM for language model support. TransPIM [89] is a PIM solution that accelerates the end-to-end transformer inference using PIM. The work proposes a data loading overhead reduction technique by customizing its dataflow for transformer models.…”
Section: Discussionmentioning
confidence: 99%
“…While Rowclone [40] proposes bulk data copy of a row data across different banks, significant data movement induced in data analytics are flexible that Rowclone cannot be effectively utilized. TransPIM [51] and GearBox [31] propose specific network-on-chip (NoC) for efficient DRAM internal data movement for target applications. However, their NoCs consume a large area overhead considering the DRAM area constraint.…”
Section: Challenge Of Data Analytics a Internal Data Movement Overhea...mentioning
confidence: 99%
“…PIM and NMP Newton, HBM-PIM, TransPIM, McDRAM, Ambit, and SIMDRAM [15], [16], [30], [41], [43], [51] support the regular or non-condition-oriented workloads to avoid data dependent dataflow by accelerating memory-bound vector operations exploiting internal parallelism of DRAM. [19], [25], [38] accelerates a recommendation system where the gatherand-scatter operations are the main target.…”
Section: Related Workmentioning
confidence: 99%
“…When the size of intermediate data surpasses the allocated memory block size, frameworks need to require more memory. To mitigate the problem, memory pool techniques, optimized by profiling inference process or DL model, can be employed for efficient memory management [90,98]. For example, the memory access pattern can be saved during model conversion and the inference framework can allocate all memory directly during the setup stage.…”
Section: Implications and Suggestionsmentioning
confidence: 99%