In order to make software applications simpler to write and easier to maintain, a software digital signal-processing library that performs essential signal-and image-processing functions is an important part of every digital signal processor (DSP) developer's toolset. In general, such a library provides highlevel interface and mechanisms, therefore, developers only need to know how to use algorithms, not the details of how they work. Complex signal transformations then become function calls, e.g., C-callable functions. Considering the two-dimensional (2-D) convolver function as an example of great significance for DSP's, this paper proposes to replace this software function by an emulation on a field-programmable gate array (FPGA) initially configured by software programming. Therefore, the exploration of the 2-D convolver's design space will provide guidelines for the development of a library of DSP-oriented hardware configurations intended to significantly speed up the performance of general DSP processors. Based on the specific convolver, and considering operators supported in the library as hardware accelerators, a series of tradeoffs for efficiently exploiting the bandwidth between the general-purpose DSP and accelerators are proposed. In terms of implementation, this paper explores the performance and architectural tradeoffs involved in the design of an FPGA-based 2-D convolution coprocessor for the TMS320C40 DSP microprocessor available from Texas Instruments Incorporated, Dallas, TX. However, the proposed concept is not limited to a particular processor.Index Terms-Architectural tradeoffs, custom computing machine, design methodology, design reuse, DSP function library, hardware/software co-design, reconfigurable hardware accelerator.
Analog and Mixed Signal (AMS) designs are an important part of embedded systems that link digital designs to the analog world. Due to challenges associated with its verification process, AMS designs require a considerable portion of the total design cycle time. In contrast to digital designs, the verification of AMS systems is a challenging task that requires lots of expertise and deep understanding of their behavior. Researchers started lately studying the applicability of formal methods for the verification of AMS systems as a way to tackle the limitations of conventional verification methods like simulation. This paper surveys research activities in the formal verification of AMS designs as well as compares the different proposed approaches.
Abstract-State-of-the-art System-on-Chip (SoC) consists of hundreds of processing elements, while trends in design of the next generation of SoC point to integration of thousand of processing elements, requiring high performance interconnect for high throughput communications. Optical on-chip interconnects are currently considered as one of the most promising paradigms for the design of such next generation MultiProcessors System on Chip (MPSoC). They enable significantly increased bandwidth, increased immunity to electromagnetic noise, decreased latency, and decreased power. Therefore, defining new architectures taking advantage of optical interconnects represents today a key issue for MPSoC designers. Moreover, new design methodologies, considering the design constraints specific to these architectures are mandatory. In this paper, we present a contention-free new architecture based on optical network on chip, called Optical Ring Network-on-Chip (ORNoC). We also show that our network scales well with both large 2D and 3D architectures. For the efficient design, we propose automatic wavelength-/waveguide assignment and demonstrate that the proposed architecture is capable of connecting 1296 nodes with only 102 waveguides and 64 wavelengths per waveguide.
This paper presents a low-power and low-area variant of the recently proposed parallel regeneration technique (PRT), thus providing an improved technique for the regeneration of long integrated intexronnects. Taking advantage of the particular design of the regenerator in PRT, we propose a variant (called VPRT), where t' regenerators along the interconnection have a variable size. Electrical simulations involving different interconnection lengths and technological processes are carried out to show that the interconnection delay, obtained with VPRT, is smaller than with PRT. A performance analysis combining area (A), delay 0, and power dissipation (P) shows that, VPRT leads to an ATP metric at least 4 times better than with PRT. L INIRODUCT'IONIn general, delay on a long integrated interconnection grows as the square of its length. F " o r e , as component sizes are shrunk [ 11 [21, switching delay of logic decreases, but the size of the chips tend to grow. Due to the inaeased complexity of the systems being integrated, interconnection delays grow and they become a critical performance bottleneck.InW. complex VLSI circuits increasingly rely on long interconnections [31.In previous work [4] [51 [61, different structures were pre posed to solve this problem. In [a. we suggested a regeneration technique which was shown to achieve an AT performance metric lower than conventional methods (RID), based on repetitively inserted drivers [SI. In this paper, we propose a low-power and low-area variant of this technique, called m. We also propose a performance analysis of this regenerating structure under a more general metric: an p;Tp metric (A for area, T far delay and P for power). 0-7803-2428-5195 $4.00 0 1995 IEEE 50 U. CRcUrr DESCRIPTIONThe regeneration structure (Fig. 1) introduced in [61 uses a regenerator (transistors T, through T,) which is inserted at regular intervals in the interconnection to be regenerated, thus dividing the interumn~tion into segments. Let us assume that the line is first precharged to V,, by means of transistors T,, controlled by a globally available and properly deskewed clock signal. Distributing such a signal with low skew and characterizing this skew has been discussed elsewhere [71. In this case, only the discharge process needs to be analyzed. since the propagation delay of a logical "1" is negligible. The signal to be sent through the intemnnection is then presented to the gate of the emitter transistor. T,, which initiates the discharge of the line at point B. At different points along the interconnection, a pull-down transistor (T3) accelerates this discharge, as soon as a sense gate detects this transition (wansistors T2 and T4). At the end of the interconnection, a detector is used in order to detect the signal as early as possible.Due to the complexity af Fig.1 circuit, which follows from the high degree of coupling berween stages, the analysis done in [a was simplified firstly by assuming that the regenexators are regularly spaced, and secondly by using heuristics, so that the d...
This paper addresses the problem of clocking large high-speed digital systems, as well as deterministic skew modeling, a related problem. A conventional method for clocking a large digital system is to use a set of metallic lines organized as a tree. This method is limited by the bandwidth of the clock network. Another limitation of existing solutions is that available skew models do not directly take into account process variations. In order to provide a reliable skew model, and to avoid the frequency limitation, we propose a novel approach that distributes the clock with an H-tree, whose branches are composed of minimum-sized inverters rather than metal. With such a structure, we obtain the highest clocking rate achievable with a given technology. Indeed, clock rates around 1 GHz are possible with a 1.2 m CMOS technology. From the skew modeling standpoint, we derive an analytic expression of the skew between two leaves of the H-tree, which we consider to be the difference in root-to-leaf delay pairs. The skew upper bound obtained has an order of complexity which, with respect to the H-tree size D, is the same as the one that may be derived from the Fisher and Kung model for both side-toside and neighbor-to-neighbor communications, i.e., a (D 2 ), whereas, the Steiglitz and Kugelmass probabilistic model predicts 2(D 2 p Log D). In an H-tree implemented with metallic lines, the leaf-to-leaf skew is obviously bounded by the delay between the root and the leaves. However, with the logic based H-tree proposed in this paper, we arrive at a nonobvious result, which states that the leaf-to-leaf skew grows faster than the root-toleaf delay in presence of a uniform transistor time constant gradient. This paper also proposes generalizations of the skew model to 1) the case of chips in a wafer subject to a smooth, but nonuniform gradient and 2) the case of H-tree configurations mixing logic and interconnections; in this respect, this paper covers the H-tree configurations based on the combination of logic and interconnections.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.