Proceedings of the 37th Annual International Symposium on Computer Architecture 2010
DOI: 10.1145/1815961.1816020
|View full text |Cite
|
Sign up to set email alerts
|

Data marshaling for multi-core architectures

Abstract: Previous research has shown that Staged Execution (SE), i.e., dividing a program into segments and executing each segment at the core that has the data and/or functionality to best run that segment, can improve performance and save power. However, SE's benefit is limited because most segments access inter-segment data, i.e., data generated by the previous segment. When consecutive segments run on different cores, accesses to inter-segment data incur cache misses, thereby reducing performance. This paper propos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 33 publications
(11 citation statements)
references
References 40 publications
(28 reference statements)
0
11
0
Order By: Relevance
“…Since those bottlenecks are only 52-instruction long on average, the benefit of accelerating them does not overcome the cache miss penalty, which causes BIS' performance to be lower than that of ACS. BIS is able to improve performance of tsp over ACS with the Data Marshaling mechanism [35] that reduces cache misses incurred by the bottlenecks, as we show in Section 5.4.…”
Section: Optimal Number Of Threadsmentioning
confidence: 79%
See 1 more Smart Citation
“…Since those bottlenecks are only 52-instruction long on average, the benefit of accelerating them does not overcome the cache miss penalty, which causes BIS' performance to be lower than that of ACS. BIS is able to improve performance of tsp over ACS with the Data Marshaling mechanism [35] that reduces cache misses incurred by the bottlenecks, as we show in Section 5.4.…”
Section: Optimal Number Of Threadsmentioning
confidence: 79%
“…Data Marshaling [35] (DM) has been proposed to reduce these cache misses, by identifying and marshaling the cache lines required by the remote core. It is easy to integrate DM with our proposal and we evaluate our proposal with and without DM in Section 5.4.…”
Section: Transfer Of Cache State To the Large Corementioning
confidence: 99%
“…Since those bottlenecks are only 52instruction long on average, the benefit of accelerating them does not overcome the cache miss penalty, which causes BIS' performance to be lower than that of ACS. BIS is able to improve performance of tsp over ACS with the Data Marshaling mechanism [35] that reduces cache misses incurred by the bottlenecks, as we show in Section 5.4.…”
Section: Optimal Number Of Threadsmentioning
confidence: 79%
“…A bottleneck executing remotely on the large core may require data that resides in the small core, thereby producing cache misses that reduce the benefit of acceleration. Data Marshaling [35] (DM) has been proposed to reduce these cache misses, by identifying and marshaling the cache lines required by the remote core. It is easy to integrate DM with our proposal and we evaluate our proposal with and without DM in Section 5.4.…”
Section: Transfer Of Cache State To the Large Corementioning
confidence: 99%
“…The hardware technique we use is similar to those solutions, but it is adapted for transactional memory. Optimization techniques used in those solutions, like "Data Marshaling" proposed in [54] used for predictively transferring data from one core to another during migration to reduce the number of cache's misses, can be used to complement our solution or any other cache optimization [55].…”
Section: A Single-isa Asymmetric Multicorementioning
confidence: 99%