Design of a Low-Power VLSI Macrocell for Nonlinear Adaptive Video Noise Reduction

Saponara, Sergio; Fanucci, Luca; Terreni, Pierangelo

doi:10.1155/s1110865704403035

Cited by 6 publications

(8 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This number varies slightly with image size and aspect ratio. The ASIP performs well comparing the synthesis results with stateof-art implementations of similar non linear video filtering algorithms on DSP [15] or dedicated VLSI cells [16]. DSP-based implementations have been proposed in the literature for the real time elaboration of up to CIF videos but their power cost is in the order of watts, more than one order of magnitude higher than the ASIP power consumption.…”

Section: Synthesis and Performancementioning

confidence: 91%

ASIP Design and Synthesis for Non Linear Filtering in Image Processing

Fanucci

Cassiano

Saponara

et al. 2006

Proceedings of the Design Automation &Amp;amp;amp; Test in Europe Conference

View full text Add to dashboard Cite

This paper presents an Application Specific Instruction Set Processor (ASIP) design for the implementation of a class of nonlinear image processing algorithms, the Retinex-like filters. Starting from high level descriptions, first algorithmic optimization is accomplished. Then a processor architecture and an instruction set are customized with special respect to the algorithmic computations in order to achieve the specified timing at reasonable complexity. Taking advantage of the programmability of processor architectures, the flexibility of the system is increased, involving e.g. dynamic parameter adjustment and color treatment. ASIP implementation results in 0.13 μm CMOS technology are presented

show abstract

Section: Synthesis and Performancementioning

confidence: 91%

ASIP Design and Synthesis for Non Linear Filtering in Image Processing

Fanucci

Cassiano

Saponara

et al. 2006

Proceedings of the Design Automation &Amp;amp;amp; Test in Europe Conference

View full text Add to dashboard Cite

show abstract

“…Partial products from N/ f multipliers are summed together in the accumulation path. Finally, the results from the accumulation path are carried on to the post-processing unit to perform the summation operation, thus satisfies the computation in (7) [6][7][8].…”

Section: Conventional Booth-algorithm Fir Architectures Using Foldingmentioning

confidence: 99%

“…(W/2)−1 l=0 f −1 k=0 and ×2 2l are sequentially computed in the post-processing unit. According to (8), this integrated folding scheme can design an FIR architecture with a high folding number by increasing the folding number of tap folding. Moreover, unlike the conventional tap folding, its partial-product shifting operation is processed in the post-processing unit to reduce hardware complexity in the accumulation path.…”

Section: Proposed Fir Architecturementioning

confidence: 99%

“…With the regular computation of an architecture, a folding scheme that utilizes the same and small hardware component to repeatedly complete a set of computation is frequently used to reduce the hardware complexity of such architecture [1,2]. Generally, the folding schemes of an FIR architecture can be classified into input-data folding, coeffi-cient folding, and tap folding [3][4][5][6][7][8][9][10][11]. Additionally, while advances in nanoelectronic fabrication have enabled integrated circuits to operate at a high frequency, the throughput-rate demand of an FIR filter does not change significantly.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Hardware-Efficient Programmable FIR Processor Using Input-Data and Tap Folding

Chen

2007

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

Advances in nanoelectronic fabrication have enabled integrated circuits to operate at a high frequency. The finite impulse response (FIR) filter needs only to meet real-time demand. Accordingly, increasing the FIR architecture's folding number can compensate the high-frequency operation and reduce the hardware complexity, while continuing to allow applications to operate in real time. In this work, the folding scheme with integrating input-data and tap folding is proposed to develop a hardware-efficient programmable FIR architecture. With the use of the radix-4 Booth algorithm, the 2-bit input subdata approach replaces the conventional 3-bit input subdata approach to reduce the number of latches required to store input subdata in the proposed FIR architecture. Additionally, the tree accumulation approach with simplified carry-in bit processing is developed to minimize the hardware complexity of the accumulation path. With folding in input data and taps, and reduction in hardware complexity of the input subdata latches and accumulation path, the proposed FIR architecture is demonstrated to have a low hardware complexity. By using the TSMC 0.18 µm CMOS technology, the proposed FIR processor with 10-bit input data and filter coefficient enables a 128-tap FIR filter to be performed, which takes an area of 0.45 mm 2 , and yields a throughput rate of 20 M samples per second at 200 MHz. As compared to the conventional FIR processors, the proposed programmable FIR processor not only meets the throughput-rate demand but also has the lowest area occupied per tap. Hardware complexity Real time Computational performance Conventional architectures with fixed folding number Architectures with capability of increasing folding number Programmable processors Circuit speed

show abstract

“…A GPU has a power consumption ranging from tens to hundreds Watts, depending on the workload [21]. Dedicated integrated circuits (ICs) have been proposed in literature [12,[22][23][24][25][26][27][28][29][30] whose power consumption is limited to hundreds mW; however, they are dedicated to a specific algorithm, for example, motion estimation for interframe video coding in [28] or dynamic range compression for display of mobile devices in [27] or audio oversampling and noise shaping in [12,30]. Instead, a programmable solution covering multiple tasks is needed.…”

Section: Introductionmentioning

confidence: 99%

Homogeneous and Heterogeneous MPSoC Architectures with Network‐On‐Chip Connectivity for Low‐Power and Real‐Time Multimedia Signal Processing

Saponara

Fanucci

2012

VLSI Design

Self Cite

View full text Add to dashboard Cite

Two multiprocessor system-on-chip (MPSoC) architectures are proposed and compared in the paper with reference to audio and video processing applications. One architecture exploits a homogeneous topology; it consists of 8 identical tiles, each made of a 32-bit RISC core enhanced by a 64-bit DSP coprocessor with local memory. The other MPSoC architecture exploits a heterogeneous-tile topology with on-chip distributed memory resources; the tiles act as application specific processors supporting a different class of algorithms. In both architectures, the multiple tiles are interconnected by a network-on-chip (NoC) infrastructure, through network interfaces and routers, which allows parallel operations of the multiple tiles. The functional performances and the implementation complexity of the NoC-based MPSoC architectures are assessed by synthesis results in submicron CMOS technology. Among the large set of supported algorithms, two case studies are considered: the real-time implementation of an H.264/MPEG AVC video codec and of a low-distortion digital audio amplifier. The heterogeneous architecture ensures a higher power efficiency and a smaller area occupation and is more suited for low-power multimedia processing, such as in mobile devices. The homogeneous scheme allows for a higher flexibility and easier system scalability and is more suited for general-purpose DSP tasks in power-supplied devices.

show abstract

Design of a Low-Power VLSI Macrocell for Nonlinear Adaptive Video Noise Reduction

Cited by 6 publications

References 21 publications

ASIP Design and Synthesis for Non Linear Filtering in Image Processing

ASIP Design and Synthesis for Non Linear Filtering in Image Processing

A Hardware-Efficient Programmable FIR Processor Using Input-Data and Tap Folding

Homogeneous and Heterogeneous MPSoC Architectures with Network‐On‐Chip Connectivity for Low‐Power and Real‐Time Multimedia Signal Processing

Contact Info

Product

Resources

About