“…Because of the serial nature, algorithm optimization for parallel SpTRSV have been mainly developed on top of the level-set methods by Anderson and Saad [2] and Saltz [45] and color-set methods by Schreiber and Tang [46] for various parallel architectures [25,29,39,40,51]. Despite their effectiveness, the barrier synchronization (where itself is a well-known bottleneck for parallel program [4,14,20,21,38,65]) often limits the performance of parallel SpTRSV. To address this problem, Park et al [40] sparsified synchronization through pruning unneeded dependencies, and Liu et al replaced synchronization with atomic operations [29] and developed a implementation for further parallelizing multiple right-hand sides [30].…”