2019
DOI: 10.1016/j.jpdc.2018.07.018
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing sparse tensor times matrix on GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 31 publications
(14 citation statements)
references
References 25 publications
0
14
0
Order By: Relevance
“…Methods for Tucker decomposition include higher-order SVD (HOSVD) [32], truncated HOSVD [32], Alternating Least Squares (ALS) based methods [66], the popular higher-order orthogonal iteration (HOOI) [33], Newton Grassmann optimization [36]. Sparse Tucker also comes from two aspects: the sparse tensor from applications [83,89,90,127] and the constrained sparse factors. e computational tensor kernel of Tucker decomposition is the Tensor-Times-Matrix operation (T ) (will be described in Section 4.4).…”
Section: Tensor Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Methods for Tucker decomposition include higher-order SVD (HOSVD) [32], truncated HOSVD [32], Alternating Least Squares (ALS) based methods [66], the popular higher-order orthogonal iteration (HOOI) [33], Newton Grassmann optimization [36]. Sparse Tucker also comes from two aspects: the sparse tensor from applications [83,89,90,127] and the constrained sparse factors. e computational tensor kernel of Tucker decomposition is the Tensor-Times-Matrix operation (T ) (will be described in Section 4.4).…”
Section: Tensor Methodsmentioning
confidence: 99%
“…is T algorithm directly operates on the input sparse tensor by avoiding tensor transformation. e explanation of Algorithm 5 can be found in the work [85,90].…”
Section: Tmentioning
confidence: 99%
“…Shaden et al, [41] used a Compressed Sparse Tensors (CSF) structure which can optimize the access efficiency for HOHDST. Tensor-Time-Matrix-chain (TTMc) [42] is a key part for Tucker Decomposition (TD) and TTMc is a data intensive task. Ma et al, [42] optimized the TTMc operation on GPU which can take advantage of intensive and partitioned computational resource of GPU, i.e., a warp threads (32) are automatically synchronized and this mechanism is apt to matrices blockblock multiplication.…”
Section: Related Studiesmentioning
confidence: 99%
“…On the Intel many-core processor KNL (Knights Landing), the computation of CP decomposition is balanced among the processing units, which leads to 1.8 × performance speedup . Li et al (2016); Ma et al (2019) propose an optimized design of sparse tensor-times-dense matrix multiply on GPU that exploits fine thread granularity, coalesced memory access, rank blocking and fast shared memory. F-COO (Liu et al 2017) proposes a unified tensor format along with GPU-specific optimizations that leverages the similar computation patterns between tensor operations.…”
Section: Related Workmentioning
confidence: 99%