2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference On 2018
DOI: 10.1109/hpcc/smartcity/dss.2018.00109
|View full text |Cite
|
Sign up to set email alerts
|

Multi-role SpTRSV on Sunway Many-Core Architecture

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 25 publications
0
10
0
Order By: Relevance
“…For each CDP, it enumerates the sample-NMO velocity pairs (line 2), and then nds the intersection of the traveltime curve and traces. At each intersection, it rst obtains the halfpoint of the current trace (line 9-11), then accesses the data with size of w (line 12-13), and nally retrieves the data computed in a window of width w (line [14][15][16][17][18][19]. Each trace has its own corresponding halfpoints, therefore the accesses to halfpoints are continuous when walking through the traces sequentially.…”
Section: Improving Parallelism Within a Cgmentioning
confidence: 99%
See 3 more Smart Citations
“…For each CDP, it enumerates the sample-NMO velocity pairs (line 2), and then nds the intersection of the traveltime curve and traces. At each intersection, it rst obtains the halfpoint of the current trace (line 9-11), then accesses the data with size of w (line 12-13), and nally retrieves the data computed in a window of width w (line [14][15][16][17][18][19]. Each trace has its own corresponding halfpoints, therefore the accesses to halfpoints are continuous when walking through the traces sequentially.…”
Section: Improving Parallelism Within a Cgmentioning
confidence: 99%
“…A er initialization, the traces in a CDP are processed in sequence (line 9) and the data halfpoints is prefetched before a new trace is processed (line 10-12). For the current trace, the memory addresses of the data accesses are calculated for each sample-NMO velocity pair and kept in the k1 array (line [13][14][15]. en, the maximum and minimum memory address in k1 array is identi ed (line [16][17][18] and used to determine the memory range (len th) of data accesses (line 19).…”
Section: 32mentioning
confidence: 99%
See 2 more Smart Citations
“…The second challenge is to optimize the generated code regarding the unique architecture features of Sunway. Observed by existing research works [15][16][17], the key to achieve high performance on Sunway is to 1) fully utilize the computing resources of CPEs for massive parallelism, and 2) leverage the LDM of each CPE to alleviate the bottleneck of memory access. Therefore, when the neural network compiler optimizes the generated code, the following three rules need to be complied: 1) use the DMA as much as possible when accessing main memory.…”
Section: Challenges For DL Compilation On Sunwaymentioning
confidence: 99%