Guoling Han scite author profile

Designing an application-specific embedded system in nanometer technologies has become more difficult than ever due to the rapid increase in design complexity and manufacturing cost. Efficiency and flexibility must be carefully balanced to meet different application requirements. The recently emerged configurable and extensible processor architectures offer a favorable tradeoff between efficiency and flexibility, and a promising way to minimize certain important metrics (e.g., execution time, code size, etc.) of the embedded processors. This paper addresses the problem of generating the application-specific instructions to improve the execution speed for configurable processors. A set of algorithms, including pattern generation, pattern selection, and application mapping, are proposed to efficiently utilize the instruction set extensibility of the target configurable processor. Applications of our approach to several real-life benchmarks on the Altera Nios processor show encouraging performance speedup (2.75X on average and up to 3.73X in some cases).

show abstract

Architecture and Synthesis for On-Chip Multicycle Communication

Cong

Fan

Han

et al. 2004

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

Abstract-For multigigahertz designs in nanometer technologies, data transfers on global interconnects take multiple clock cycles. In this paper, we propose a regular distributed register (RDR) microarchitecture, which offers high regularity and direct support of multicycle on-chip communication. The RDR microarchitecture divides the entire chip into an array of islands so that all local computation and communication within an island can be performed in a single clock cycle. Each island contains a cluster of computational elements, local registers, and a local controller. On top of the RDR microarchitecture, novel layout-driven architectural synthesis algorithms have been developed for multicycle communication, including scheduling-driven placement, placement-driven simultaneous scheduling with rebinding, and distributed control generation, etc. The experimentation on a number of real-life examples demonstrates promising results. For data flow intensive examples, we obtain a 44% improvement on average in terms of the clock period and a 37% improvement on average in terms of the final latency, over the traditional flow. For designs with control flow, our approach achieves a 28% clock-period reduction and a 23% latency reduction on average.

show abstract

AutoPilot: A Platform-Based ESL Synthesis System

Zhang¹,

Fan²,

Jiang³

et al.

View full text Add to dashboard Cite

Platform-Based Behavior-Level and System-Level Synthesis

Cong

Fan

Han

et al. 2006

View full text Add to dashboard Cite

Abstract-With the rapid increase of complexity in Systemon-a-Chip (SoC) design, the electronic design automation (EDA) community is moving from RTL (Register Transfer Level) synthesis to behavioral-level and system-level synthesis. The needs of system-level verification and software/hardware codesign also prefer behavior-level executable specifications, such as C or SystemC. In this paper we present the platform-based synthesis system, named xPilot, being developed at UCLA. The first objective of xPilot is to provide novel behavioral synthesis capability for automatically generating efficient RTL code from a C or SystemC description for a given system platform and optimizing the logic, interconnects, performance, and power simultaneously. The second objective of xPilot is to provide a platform-based system-level synthesis capability, including both synthesis for application-specific configurable processors and heterogeneous multi-core systems. Preliminary experiments on FPGAs demonstrate the efficacy of our approach on a wide range of applications and its value in exploring various design tradeoffs. I. MOTIVATIONThe relentless tracking of Moore's curve by the entire semiconductor industry has showcased the exponential scaling of the transistor feature size by a factor of 0.7 reduction every three years. This leads to exponentially increasing transistor counts and results in an explosive growth in functionality and the amount of computing power available on a single chip. Today it is perfectly feasible to design a System-on-a-Chip (SoC) with one billion transistors [7], and it is generally believed that industry will continue to overcome technical hurdles to sustain this trend for another decade. However, the cost of developing these chips and providing production facilities is also growing at a very fast pace. For instance, the total development cost of a single complex, high-density SoC at today's 90-nm technology can easily be in the $20 to $30 million range. The ITRS 2005 edition [7] has also emphasized that the cost of design remains the greatest threat to continuation of the semiconductor roadmap.Unfortunately, the progress of design technologies lags behind that of process manufacturing technologies. The constantly improving CAD tools can help to mitigate the problem by delivering faster simulation, higher capacity formal verification, and better logic synthesis coupled with place-androute. However, these improvements fail to close the design productivity gap, i.e., the number of available transistors grows faster than the ability to meaningfully design them.It is commonly acknowledged that the ultimate solution is to move to the next level of abstraction beyond RTL, and Electronic system-level (ESL) design automation has been widely identified as the next productivity boost for the semiconductor industry. However, despite some recent success in ESL simulation, the transition to ESL design will not be as well accepted as the transition to RTL without robust and efficient behavior-level and system-level syn...

show abstract

Instruction set extension with shadow registers for configurable processors

Cong

Fan

Han

et al. 2005

View full text Add to dashboard Cite

Configurable processors are becoming increasingly popular for modern embedded systems (especially for the field-programmable system-on-a-chip). While steady progress has been made in the tools and methodologies of automatic instruction set extension for configurable processors, the limited data bandwidth available in the core processor (e.g., the number of simultaneous accesses to the register file) becomes a potential performance bottleneck. In this paper we first present a quantitative analysis of the data bandwidth limitation in configurable processors, and then propose a novel low-cost architectural extension and associated compilation techniques to address the problem. The application of our approach results in a promising performance improvement.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Guoling Han

Application-specific instruction generation for configurable processor architectures

Architecture and Synthesis for On-Chip Multicycle Communication

AutoPilot: A Platform-Based ESL Synthesis System

Platform-Based Behavior-Level and System-Level Synthesis

Instruction set extension with shadow registers for configurable processors

Contact Info

Product

Resources

About