2012
DOI: 10.1145/2355585.2355592
|View full text |Cite
|
Sign up to set email alerts
|

Disjoint out-of-order execution processor

Abstract: High-performance superscalar architectures used to exploit instruction level parallelism in single-thread applications have become too complex and power hungry for the multicore processors era. We propose a new architecture that uses multiple small latency-tolerant out-of-order cores to improve single-thread performance. Improving single-thread performance with multiple small out-of-order cores allows designers to place more of these cores on the same die. Consequently, emerging highly parallel applications ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
4
1
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 69 publications
0
6
0
Order By: Relevance
“…As the effectiveness of thread partition is determined by these five parameters, and [LLoTG, ULoTG, DDC, LLoSD, ULoSD] represents the partition scheme. For example, a partition scheme could be [10,50,18,20,30]. These values indicate that thread granularity ranges from 10 to 50, and data dependence count is no more than 18, and spawning distance is set from 20 to 30 during the period of thread partition, and H 1 , H 2 , H 3 , H 4 , H 5 can be expressed as follows:…”
Section: Partitioning Schemementioning
confidence: 99%
See 1 more Smart Citation
“…As the effectiveness of thread partition is determined by these five parameters, and [LLoTG, ULoTG, DDC, LLoSD, ULoSD] represents the partition scheme. For example, a partition scheme could be [10,50,18,20,30]. These values indicate that thread granularity ranges from 10 to 50, and data dependence count is no more than 18, and spawning distance is set from 20 to 30 during the period of thread partition, and H 1 , H 2 , H 3 , H 4 , H 5 can be expressed as follows:…”
Section: Partitioning Schemementioning
confidence: 99%
“…Within this model that the closer the distance between x q and x i is, the bigger the weights are and the sum of weight is equal to 1, we can obtain the assignment weights in the formula (10).…”
Section: Generation Of Partition Schemementioning
confidence: 99%
“…The existing contributions on a hardware approach to automatize parallelization [18][19][20][21] are penalized by the low basic ILP measured in programs. 10 The hardware-based parallelization in Goossens et al 22 overcomes this limitation in 2 ways: (1) very distant ILP is caught when fetch is parallelized, (2) all stack memory false dependences and stack pointer true dependences are removed.…”
Section: Related Work and Conclusionmentioning
confidence: 99%
“…They gave solutions to allocate later and free sooner the needed resources to optimize their usage and so, take care of more "on-the-fly" instructions with the same resources. In 2012, Sharafeddine, Jothi and Akkary [12] proposed an architecture to partition a run into parallel threads, forking the leading thread at call. In the sum example this leads to fork on both of the highest levels calls but not on the lower levels, capturing only a small part of the distant ILP.…”
Section: Ilp In Programsmentioning
confidence: 99%