Proceedings of the 2016 International Conference on Supercomputing 2016
DOI: 10.1145/2925426.2926291
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Transposition of Sparse Data Structures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
29
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
2
2

Relationship

7
2

Authors

Journals

citations
Cited by 37 publications
(29 citation statements)
references
References 42 publications
0
29
0
Order By: Relevance
“…To benchmark atomic operations, we use two kernels that involves atomic operations: an atomic-based SpTRANS method described by Wang et al (2016) and a synchronization-free SpTRSV algorithm proposed by Liu et al (2017). The SpTRANS method first uses atomic-add operations to sum the number of nonzeros in each column (assuming both the input and output matrices are in row-major) and then scatters nonzeros in rows into columns through an atomicbased counter.…”
Section: Sparse Kernelsmentioning
confidence: 99%
“…To benchmark atomic operations, we use two kernels that involves atomic operations: an atomic-based SpTRANS method described by Wang et al (2016) and a synchronization-free SpTRSV algorithm proposed by Liu et al (2017). The SpTRANS method first uses atomic-add operations to sum the number of nonzeros in each column (assuming both the input and output matrices are in row-major) and then scatters nonzeros in rows into columns through an atomicbased counter.…”
Section: Sparse Kernelsmentioning
confidence: 99%
“…Many studies investigate the data-level parallelism on x86-based systems [21,23,36,42]. Correspondingly, several studies have illustrated the benets of using registers to improve performance on GPUs.…”
Section: Sux Array Constructionmentioning
confidence: 99%
“…both for traditional HPC applications and for big data processing. In these cases, a large amount of independent arrays oen need to be sorted as a whole, either because of algorithm characteristics (e.g., sux array construction in prex doubling algorithms from bioinformatics [15,44]), or dataset properties (e.g., sparse matrices in linear algebra [4,[28][29][30][31]42]), or real-time requests from web users (e.g., queries in data warehouse [45,49,51]). e second trend is that with the rapidly increased computational power of new processors, sorting a single array at a time usually cannot fully utilize the devices, thus grouping multiple independent arrays and sorting them simultaneously are crucial for high utilization.…”
mentioning
confidence: 99%
“…Compared to stochastic gradient descent (SGD) [8,9], the ALS algorithm is not only inherently parallel, but can incorporate implicit ratings [1]. Nevertheless, the ALS algorithm involves parallel sparse matrix manipulation [10] which is challenging to achieve high performance due to imbalanced workload [11,12,13], random memory access [14,15], unpredictable amount of computations [16] and task dependency [17,18,19]. This particularly holds when parallelizing and optimizing ALS on modern multi-cores and many-cores [20].…”
Section: Introductionmentioning
confidence: 99%