Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream Processors

Yang, Xuejun; Wang, Li; Xue, Jingling; Wu, Qingbo

doi:10.1145/2133382.2133387

Cited by 8 publications

(11 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One of such representative approaches is comparability graph coloring (CGC) approach. 9 This approach performs well on the stream processor architecture with several high-performance computing applications such as GEMM, FFT, and MG. Ideally, it can achieve the same SPM usage as theoretical F I G U R E 10 An example to show that GC cannot find the optimal allocation solution for Inception21k.…”

Section: Discussionmentioning

confidence: 98%

Space‐address decoupled scratchpad memory management for neural network accelerators

Zhang

Sun

Chen³

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

Deep neural networks have been demonstrated to be useful in varieties of intelligent tasks, and various specialized NN accelerators have been proposed recently to improve the hardware efficiency, which are typically equipped with software-managed scratchpad memory (SPM) for high performance and energy efficiency. However, traditional SPM management techniques cause memory fragmentation for NN accelerators, and thus lead to low utilization of precious SPM. The main reason is that traditional techniques are originally designed for managing fixed-length registers rather than variable-length memory blocks. In this article, we propose a novel SPM management approach for NN accelerators. The basic intuition is that NN computation/memory behaviors are predictable and relatively regular compared with traditional applications, and thus most information can be determined at compile time. In addition, by exploiting the variable-length feature of SPM, we propose to divide the allocation process into two passes: the space assignment and the address assignment pass, which are simultaneously (and implicitly) performed in traditional one-pass allocation techniques. Experimental results on the memory requests of a representative NN accelerator demonstrate that the proposed approach can significantly reduce the memory consumption by 30% at most compared with state-of-the-art SPM management techniques, and the memory usage is only 2% larger than that of the theoretical optimal allocation. K E Y W O R D S deep neural network, memory management, scratchpad memory 1 INTRODUCTION Deep neural networks (DNNs) have been widely used for various applications, such as computer vision, 1 speech recognition, 2 machine translation, 3 and robotics, 4 due to the improved accuracy over traditional machine learning approaches. However, the performance benefits of DNNs come at the cost of extremely high computation and memory complexity, which pose great challenges to underlying hardware architecture. To improve the efficiency of DNN processing, various specialized accelerators have been proposed to deliver orders of magnitude better performance and energy efficiency than general-purpose architectures such as CPUs and GPUs. 5,6 Specialized neural network accelerators typically require various novel architectural components, including control logics (e.g., DianNao 5 employs a control processor with dedicated control instructions and Eyeriss 6 employs two-level control hierarchy 6), computational units (e.g., DianNao employs 16-fix functional units to leverage the error-tolerance features of intelligent applications), and memory hierarchy Zhenxing Zhang and Shiyan Sun contributed equally to this work.

show abstract

Section: Discussionmentioning

confidence: 98%

Space‐address decoupled scratchpad memory management for neural network accelerators

Zhang

Sun

Chen³

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

show abstract

“…The approach proposed in [28] is more efficient than the one proposed in [25] as variables with different colors may share the same space. If the interference graph is a comparability graph [29] or a superperfect graph [15], the optimal allocation can be obtained in polynomial time.…”

Section: Related Workmentioning

confidence: 99%

WCET-Aware Dynamic D-cache Locking for A Single Task

Zheng

2015

Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM

View full text Add to dashboard Cite

Caches have been extensively used to bridge the increasing speed gap between processors and off-chip memory. However, caches make it much harder to compute the WCET (Worst-Case Execution Time) of a program. Cache locking is an effective technique for overcoming the unpredictability problem of caches. We investigate the WCET aware D-cache locking problem for a single task, and propose two dynamic cache locking approaches. The first approach formulates the problem as a global ILP (Integer Linear Programming) problem that simultaneously selects a near-optimal set of variables as the locked cache contents and allocates them to the D-cache. The second one iteratively constructs a subgraph of the CFG of the task where the lengths of all the paths are close to the longest path length, and uses an ILP formulation to select a nearoptimal set of variables in the subgraph as the locked cache contents and allocate them to the D-cache. For both approaches, we propose a novel, efficient D-cache allocation algorithm. We have implemented both approaches and compared them with the longest path-based, dynamic cache locking approach proposed in [22] and the static WCET analysis approach without cache locking proposed in [14] by using a set of benchmarks from the Mälardalen WCET benchmark suite, SNU real-time benchmarks and the benchmarks used in [27]. Compared to the static WCET analysis approach, the average WCET improvements of the first approach range between 11.3% and 31.6%, and the average WCET improvements of the second approach range between 12.3% and 32.9%. Compared to the longest path-based, dynamic cache locking approach, the average WCET improvements of the first approach range between 4.7% and 14.3%, and the average WCET improvements of the second approach range between 5.3% and 15.0%.

show abstract

“…Li et al [8][9][10][11] apply it to assign arrays in embedded programs to scratchpad memory (SPM). Wang et al [18][19][20] apply it further in stream register file allocation for stream processors.…”

Section: Related Workmentioning

confidence: 99%

“…If every normal gap is too small to fit x squarely in, then all these gaps are checked again, http://engine.scichina.com/doi/10.1007/s11432-014-5131-7 χ [20,28) = 43. The gap [20,28) is selected since placing node 7 inside does not increase the chromatic number. Being smaller than w(7), the selected gap is expanded to accommodate node 7.…”

Section: Best-fit Selectmentioning

confidence: 99%

See 1 more Smart Citation

Acyclic orientation graph coloring for software-managed memory allocation

Wang

Xue

Yang

2014

Sci. China Inf. Sci.

Self Cite

View full text Add to dashboard Cite

This paper presents a novel compiler algorithm, called acyclic orientation graph coloring (AOG coloring), for managing data objects in software-managed memory allocation. The key insight is that softwaremanaged memory allocation could be solved as an interval coloring problem, or equivalently, an acyclic orientation problem. We generalize graph coloring register allocation to interval coloring memory allocation by maintaining an acyclic orientation to the currently colored subgraph. This is achieved with some well-crafted heuristics, including Aggressive Simplify that does not necessarily preserve colorability and Best-Fit Select that assigns intervals (i.e., colors) to nodes by possibly adjusting the colors already assigned to other nodes earlier. Our algorithm generalizes and subsumes as a special case the classical graph coloring register allocation algorithm without notably increased complexity: it deals with memory allocation while preserving the elegance and practicality of traditional graph coloring register allocation. We have implemented our algorithm and tested it on Appel's 27921 interference graphs for scalars (augmented with node weights). Our algorithm outperforms Memory Coloring, the best in the literature, for software-managed memory allocation, on 98.64% graphs, in which, the gaps are more than 20% on 68.31% graphs and worse only on 0.29% graphs. We also tested it on all the 73 DIMACS weighted benchmarks (weighted graphs), AOG Coloring outperforms Memory Coloring on all of the benchmarks, in which, the gaps are more than 20% on 83.56% graphs.

show abstract

Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream Processors

Cited by 8 publications

References 47 publications

Space‐address decoupled scratchpad memory management for neural network accelerators

Space‐address decoupled scratchpad memory management for neural network accelerators

WCET-Aware Dynamic D-cache Locking for A Single Task

Acyclic orientation graph coloring for software-managed memory allocation

Contact Info

Product

Resources

About