Data marshaling for multi-core architectures

Suleman, Muhammad; Mutlu, Onur; Joao, José A.; Khubaib,; Patt, Yale N.

doi:10.1145/1815961.1816020

Cited by 33 publications

(11 citation statements)

References 40 publications

(28 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since those bottlenecks are only 52-instruction long on average, the benefit of accelerating them does not overcome the cache miss penalty, which causes BIS' performance to be lower than that of ACS. BIS is able to improve performance of tsp over ACS with the Data Marshaling mechanism [35] that reduces cache misses incurred by the bottlenecks, as we show in Section 5.4.…”

Section: Optimal Number Of Threadsmentioning

confidence: 79%

See 1 more Smart Citation

Bottleneck identification and scheduling in multithreaded applications

Joao

Suleman²,

Mutlu

et al. 2012

SIGARCH Comput. Archit. News

Self Cite

View full text Add to dashboard Cite

Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. critical sections, barriers and slow pipeline stages. These bottlenecks serialize execution, waste valuable execution cycles, and limit scalability of applications. This paper proposes Bottleneck Identification and Scheduling (BIS), a cooperative software-hardware mechanism to identify and accelerate the most critical bottlenecks. BIS identifies which bottlenecks are likely to reduce performance by measuring the number of cycles threads have to wait for each bottleneck, and accelerates those bottlenecks using one or more fast cores on an Asymmetric Chip MultiProcessor (ACMP). Unlike previous work that targets specific bottlenecks, BIS can identify and accelerate bottlenecks regardless of their type. We compare BIS to four previous approaches and show that it outperforms the best of them by 15% on average. BIS' performance improvement increases as the number of cores and the number of fast cores in the system increase.

show abstract

Section: Optimal Number Of Threadsmentioning

confidence: 79%

“…Data Marshaling [35] (DM) has been proposed to reduce these cache misses, by identifying and marshaling the cache lines required by the remote core. It is easy to integrate DM with our proposal and we evaluate our proposal with and without DM in Section 5.4.…”

Section: Transfer Of Cache State To the Large Corementioning

confidence: 99%

Bottleneck identification and scheduling in multithreaded applications

Joao

Suleman²,

Mutlu

et al. 2012

SIGARCH Comput. Archit. News

Self Cite

View full text Add to dashboard Cite

show abstract

“…Since those bottlenecks are only 52instruction long on average, the benefit of accelerating them does not overcome the cache miss penalty, which causes BIS' performance to be lower than that of ACS. BIS is able to improve performance of tsp over ACS with the Data Marshaling mechanism [35] that reduces cache misses incurred by the bottlenecks, as we show in Section 5.4.…”

Section: Optimal Number Of Threadsmentioning

confidence: 79%

“…A bottleneck executing remotely on the large core may require data that resides in the small core, thereby producing cache misses that reduce the benefit of acceleration. Data Marshaling [35] (DM) has been proposed to reduce these cache misses, by identifying and marshaling the cache lines required by the remote core. It is easy to integrate DM with our proposal and we evaluate our proposal with and without DM in Section 5.4.…”

Section: Transfer Of Cache State To the Large Corementioning

confidence: 99%

Bottleneck identification and scheduling in multithreaded applications

Joao

Suleman²,

Mutlu

et al. 2012

SIGPLAN Not.

Self Cite

View full text Add to dashboard Cite

Performance of multithreaded applications is limited by a variety of bottlenecks, e.g. critical sections, barriers and slow pipeline stages. These bottlenecks serialize execution, waste valuable execution cycles, and limit scalability of applications. This paper proposes Bottleneck Identification and Scheduling in Multithreaded Applications (BIS), a cooperative software-hardware mechanism to identify and accelerate the most critical bottlenecks. BIS identifies which bottlenecks are likely to reduce performance by measuring the number of cycles threads have to wait for each bottleneck, and accelerates those bottlenecks using one or more fast cores on an Asymmetric Chip Multi-Processor (ACMP). Unlike previous work that targets specific bottlenecks, BIS can identify and accelerate bottlenecks regardless of their type. We compare BIS to four previous approaches and show that it outperforms the best of them by 15% on average. BIS' performance improvement increases as the number of cores and the number of fast cores in the system increase.

show abstract

“…The hardware technique we use is similar to those solutions, but it is adapted for transactional memory. Optimization techniques used in those solutions, like "Data Marshaling" proposed in [54] used for predictively transferring data from one core to another during migration to reduce the number of cache's misses, can be used to complement our solution or any other cache optimization [55].…”

Section: A Single-isa Asymmetric Multicorementioning

confidence: 99%