Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra

Costero, Luis; Igual, Francisco D.; Olcoz, Katzalin; Catalán, Sandra; Rodríguez‐Sánchez, Rafael; Quintana‐Ortí, Enrique S.

doi:10.1109/ipdpsw.2016.104

Cited by 7 publications

(10 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In the Odroid XU4 board, the improvements obtained are even worse than expected, if they are compared with the fact that the parallel version is using eight cores instead of four. Nevertheless, previous works have proved that using LITTLE cores has not a high impact over the performance when compared with the big ones, even increasing the execution time in some cases …”

Section: Accelerating the Execution Through Parallelizationmentioning

confidence: 98%

“…Basically, this programming model is based on including directives (#pragmas), as in other parallel programming models, such as OpenMP. These directives are mostly used to annotate certain code blocks in order to inform that those blocks are tasks; that is, basic scheduling units to be used by the available computational resources …”

Section: Accelerating the Execution Through Parallelizationmentioning

confidence: 99%

See 1 more Smart Citation

Acceleration and energy consumption optimization in cascading classifiers for face detection on low‐cost ARM big. LITTLE asymmetric architectures

Corpas

Costero

Botella

et al. 2018

Circuit Theory & Apps

Self Cite

View full text Add to dashboard Cite

This paper proposes a mechanism to accelerate and optimize the energy consumption of a face detection software based on Haar-like cascading classifiers, taking advantage of the features of low-cost asymmetric multicore processors (AMPs) with limited power budget. A modelling and task scheduling/allocation is proposed in order to efficiently make use of the existing features on big. LITTLE ARM processors, including (1) source-code adaptation for parallel computing, which enables code acceleration by applying the OmpSs programming model, a task-based programming model that handles data-dependencies between tasks in a transparent fashion; (2) different OmpSs task allocation policies which take into account the processor asymmetry and can dynamically set processing resources in a more efficient way based on their particular features.The proposed mechanism can be efficiently applied to take advantage of the processing elements existing on low-cost and low-energy multi-core embedded devices executing object detection algorithms based on cascading classifiers.Although these classifiers yield the best results for detection algorithms in the field of computer vision, their high computational requirements prevent them from being used on these devices under real-time requirements. Finally, we compare the energy efficiency of a heterogeneous architecture based on AMPs with a suitable task scheduling with that of a homogeneous symmetric architecture.

show abstract

Section: Accelerating the Execution Through Parallelizationmentioning

confidence: 98%

Section: Accelerating the Execution Through Parallelizationmentioning

confidence: 99%

Acceleration and energy consumption optimization in cascading classifiers for face detection on low‐cost ARM big. LITTLE asymmetric architectures

Corpas

Costero

Botella

et al. 2018

Circuit Theory & Apps

Self Cite

View full text Add to dashboard Cite

show abstract

“…I). Thus, actions will be the following: [3,2], [9,20] [6,4], [16,40],{x 3 ,m6 6 ,m7 9 }) | Φ(v 5 ,{m15 5 }) → ( [6,4], [16,40],{x 4 [7,4], [18,40],{x 7 ,m11 7 }) | Φ(v 3 ,{m17 3 }) → ( [7,4], [18,40],{x 8 …”

Section: B Functional Specification Of Distributed Systemsmentioning

confidence: 99%

“…A 5 : Φ(v 6 ,{m6 6 }) → ( [5,3], [13,29],{x 9 ,m13 7 ,m14 8 }) | Φ(v 6 ,{m16 6 }) → ( [5,3], [13,29],{x 10 ,m15 5 ,m14 8 } A 6 :Φ(v 7 ,{m11 7 }) → ( [9,5], [23,50],{x 11 ,m16 6 }) | Φ(v 7 ,{m13 7 }) → ( [9,5], [23,50],{x 12 [5,3], [12,30],{m28 5 …”

Section: M12 2 })mentioning

confidence: 99%

See 1 more Smart Citation

Scheduling of Distributed Algorithms for Low Power Embedded Systems

Deniziak¹,

Dzitkowski²

2016

ijacsa

View full text Add to dashboard Cite

Abstract-Recently, the advent of embedded multicore processors has created interesting technologies for power management. Systems consisting of low-power and high-efficient cores create new possibilities for the optimization of power consumption. However, new design methods, dedicated to these technologies should be developed. In this paper we present a method of static task scheduling for low-power real-time embedded systems. We assume that the system is specified as a distributed algorithm, then it is implemented using multi-core embedded processor with low-power processing capabilities. We propose a new scheduling method to create the optimal or suboptimal schedule. The goal of optimization is to minimize the power consumption while all time constraints will be satisfied or the quality of service will be as high as possible. We present experimental results, obtained for sample systems, showing advantages of our method.

show abstract

Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations

et al. 2017

Self Cite

View full text Add to dashboard Cite

Dealing with asymmetry in the architecture opens a plethora of questions from the perspective of scheduling task-parallel applications, and there exist early attempts to address this problem via ad-hoc strategies embedded into a runtime framework. In this paper we take a different path, which consists in addressing the complexity of the problem at the library level, via a few asymmetry-aware fundamental kernels, hiding the architecture heterogeneity from the task scheduler. For the specific domain of dense linear algebra, we show that this is not only possible but delivers much higher performance than a naive approach based on an asymmetry-oblivious scheduler. Furthermore, this solution also outperforms an ad-hoc asymmetry-aware scheduler furnished with sophisticated scheduling techniques.

show abstract

Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra

Cited by 7 publications

References 15 publications

Acceleration and energy consumption optimization in cascading classifiers for face detection on low‐cost ARM big. LITTLE asymmetric architectures

Acceleration and energy consumption optimization in cascading classifiers for face detection on low‐cost ARM big. LITTLE asymmetric architectures

Scheduling of Distributed Algorithms for Low Power Embedded Systems

Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations

Contact Info

Product

Resources

About