A load balancing inspired optimization framework for exascale multicore systems: A complex networks approach

Xiao, Yao; Xue, Yuankun; Nazarian, Shahin; Bogdan, Paul

doi:10.1109/iccad.2017.8203781

Cited by 37 publications

(35 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Mapping side-by-side threads on the same cores minimizes the overall communication between cores. Another approach [27] which seeks to obtain the same, minimizes inter-core communication overhead by splitting the threads into clusters such that the amount of inter-communication between clusters is minimized, while the number of clusters must not exceed the number of cores. Authors [27] say it that mapping clusters of threads to cores instead of mapping individual threads to cores is more efficient because it is easier to minimize the amount of communication between clusters instead of threads.…”

Section: Comparison Of the Numa-btlp Algorithm And Other Workmentioning

confidence: 99%

Section: Comparison Of the Numa-btlp Algorithm And Other Workmentioning

confidence: 99%

“…The speedup obtained by mapping clusters of threads instead of threads, using the algorithm in [27], varies from 10.2% to 131.82% when compared to speedup obtained when mapping individual threads. However, when the threads are independent, mapping individual threads, as happens with NUMA-BTLP [5], is 10% more efficient in terms of execution time than mapping clusters, as happens with the algorithm in [27]. Similarly, applying NUMA-BTLP algorithm [5] on one of the benchmarks tested in this paper, which has threads of type autonomous only, results in a bigger optimization of power consumption, than the optimization obtained on another benchmark, which has threads of other types.…”

Section: Comparison Of the Numa-btlp Algorithm And Other Workmentioning

confidence: 99%

“…Critical threads are delayed due to race conditions which degrades the power consumption with no performance loss [27]. Postponed threads can be considered autonomous threads relative to other threads at the same level in the thread creation hierarchy.…”

Section: Comparison Of the Numa-btlp Algorithm And Other Workmentioning

confidence: 99%

“…Both the NUMA-BTLP algorithm [5] and the methodology in [27] that models the dynamic execution and partitions the application into clusters use intermediate representation for the data dependency analysis. As opposed to methodology in [27] which uses two graphs to obtain the mapping: a weighted dynamic application graph to model the data dependencies, where nodes represent instructions from the code in intermediate representation and another graph to represent the clusters, NUMA-BTLP [5] represents both the threads and the data dependencies between them using a tree, which is constructed according to rules already given and where the nodes represent the threads.…”

Section: Comparison Of the Numa-btlp Algorithm And Other Workmentioning

confidence: 99%

See 4 more Smart Citations

Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree

Știrb

2018

Computers

View full text Add to dashboard Cite

The paper presents a Non-Uniform Memory Access (NUMA)-aware compiler optimization for task-level parallel code. The optimization is based on Non-Uniform Memory Access—Balanced Task and Loop Parallelism (NUMA-BTLP) algorithm Ştirb, 2018. The algorithm gets the type of each thread in the source code based on a static analysis of the code. After assigning a type to each thread, NUMA-BTLP Ştirb, 2018 calls NUMA-BTDM mapping algorithm Ştirb, 2016 which uses PThreads routine pthread_setaffinity_np to set the CPU affinities of the threads (i.e., thread-to-core associations) based on their type. The algorithms perform an improve thread mapping for NUMA systems by mapping threads that share data on the same core(s), allowing fast access to L1 cache data. The paper proves that PThreads based task-level parallel code which is optimized by NUMA-BTLP Ştirb, 2018 and NUMA-BTDM Ştirb, 2016 at compile-time, is running time and energy efficiently on NUMA systems. The results show that the energy is optimized with up to 5% at the same execution time for one of the tested real benchmarks and up to 15% for another benchmark running in infinite loop. The algorithms can be used on real-time control systems such as client/server based applications which require efficient access to shared resources. Most often, task parallelism is used in the implementation of the server and loop parallelism is used for the client.

show abstract

Section: Comparison Of the Numa-btlp Algorithm And Other Workmentioning

confidence: 99%

Section: Comparison Of the Numa-btlp Algorithm And Other Workmentioning

confidence: 99%

Section: Comparison Of the Numa-btlp Algorithm And Other Workmentioning

confidence: 99%

Section: Comparison Of the Numa-btlp Algorithm And Other Workmentioning

confidence: 99%

Section: Comparison Of the Numa-btlp Algorithm And Other Workmentioning

confidence: 99%

See 3 more Smart Citations

Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree

Știrb

2018

Computers

View full text Add to dashboard Cite

show abstract

Retracted: Artificial intelligence point‐to‐point signal communication network optimization based on ubiquitous clouds

Lin

Deng

et al. 2020

Int J Communication

View full text Add to dashboard Cite

Summary At present, the application of communication network has spread to every area of social life. The progress of network technology has driven the development of information technology industry and the advancement of cloud computing. With the application of artificial intelligence and deep learning technology, the network becomes more and more intelligent, so the application of artificial intelligence theory in communication network optimization modeling is more extensive. In this paper, we propose the ubiquitous clouds framework and apply the artificial intelligence optimization scheme to the point‐to‐point (P2P) signal transmission network optimization scheme. For different application scenarios, this paper analyzes and experiments the proposed method. The experimental results show that the proposed method makes full use of the advantages of communication network resources which improves the efficiency of communication network optimization and also reduces the optimization cost to a certain extent compared with the state‐of‐the‐art approaches.

show abstract

Network-Based Method for Dynamic Burden-Sharing in the Internet of Things (IoT)

Mahmood

2022

Communications in Computer and Information Science

View full text Add to dashboard Cite

A load balancing inspired optimization framework for exascale multicore systems: A complex networks approach

Cited by 37 publications

References 26 publications

Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree

Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree

Retracted: Artificial intelligence point‐to‐point signal communication network optimization based on ubiquitous clouds

Network-Based Method for Dynamic Burden-Sharing in the Internet of Things (IoT)

Contact Info

Product

Resources

About