Investimentos em infraestrutura de transportes e desigualdades regionais no Brasil: uma análise dos impactos do Programa de Aceleração do Crescimento (PAC)

This paper describes the polymorphous TRIPS architecture which can be configured for different granularities and types of parallelism. TRIPS contains mechanisms that enable the processing cores and the on-chip memory system to be configured and combined in different modes for instruction, data, or thread-level parallelism. To adapt to small and large-grain concurrency, the TRIPS architecture contains four out-of-order, 16-wide-issue Grid Processor cores, which can be partitioned when easily extractable fine-grained parallelism exists. This approach to polymorphism provides better performance across a wide range of application types than an approach in which many small processors are aggregated to run workloads with irregular parallelism. Our results show that high performance can be obtained in each of the three modes--ILP, TLP, and DLP-demonstrating the viability of the polymorphous coarse-grained approach for future microprocessors.

show abstract

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Kim

2002

View full text Add to dashboard Cite

show abstract

Composable Lightweight Processors

Kim¹,

Sethumadhavan²,

Govindan³

et al. 2007

View full text Add to dashboard Cite

Modern chip multiprocessors (CMPs) are designed to exploit both instruction-level parallelism (ILP) within processors and thread-level parallelism (TLP) within and across processors. However, the number of processors and the granularity of each processor are fixed at design time. This paper evaluates a flexible architectural approach, called Composable Lightweight Processors (or CLPs), that allows simple, low-power cores to be aggregated together dynamically, forming larger, more powerful single-threaded processors without changing the application binary. We evaluate one such design with 32 cores called TFlex, which can be configured as 32 dual-issue processors, or as a single 64-wide issue processor, or as any point in between. Use of an Explicit Data Graph Execution (EDGE) ISA enables the system to be fully composable, with no monolithic structures spanning the cores. Simulation results show that CLPs achieve an average performance boost of 42%, an average area-efficiency of 3.4x, and an average power-efficiency of 2x over a fixed architecture on a spectrum of single-threaded applications. Results also show that CLPs outperform a spectrum of fixed CMP architectures on a set of multitasking workloads.40th IEEE/ACM International Symposium on Microarchitecture

show abstract

A NUCA substrate for flexible CMP cache sharing

et al. 2005

View full text Add to dashboard Cite

We propose an organization for the on-chip memory system of a chip multiprocessor, in which 16 processors share a 16MB pool of 256 L2 cache banks. The L2 cache is organized as a non-uniform cache architecture (NUCA) array with a switched network embedded in it for high performance. We show that this organization can support the spectrum of degrees of sharing: unshared, in which each processor has a private portion of the cache, thus reducing hit latency, completely shared, in which every processor shares the entire cache, thus minimizing misses, and every point in between. We find the optimal degree of sharing for a number of cache bank mapping policies, and also evaluate a per-application cache partitioning strategy. We conclude that a static NUCA organization with sharing degrees of two or four work best across a suite of commercial and scientific parallel workloads. We also demonstrate that migratory, dynamic NUCA approaches improve performance significantly for a subset of the workloads at the cost of increased power consumption and complexity, especially as per-application cache partitioning strategies are applied.

show abstract

DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing

et al. 2012

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Changkyu Kim

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs

Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Composable Lightweight Processors

A NUCA substrate for flexible CMP cache sharing

DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing

Contact Info

Product

Resources

About