Sujit Kumar Patel scite author profile

In this paper, we analyze the contents of lookup tables (LUTs) of distributed arithmetic (DA)-based block least mean square (BLMS) adaptive filter (ADF) and based on that we propose intra-iteration LUT sharing to reduce its hardware resources, energy consumption, and iteration period. The proposed LUT optimization scheme offers a saving of 60% LUT content for block size 8 and still higher saving for larger block sizes over the conventional design approach. We also present here the design of a register-based LUT matrix for maximal sharing of LUT contents and full-parallel LUT-update operation. Based on the proposed design approach, we have derived a DA-based architecture for the BLMS ADF, which is scalable for larger block sizes as well as higher filter lengths. We find that the hardware complexity of the proposed structure increases less than proportionately with input block size and filter length. It offers a saving of 60% LUT-update per output and 59% LUT access per output over the recently proposed DA-based BLMS ADF structure for block size 8 and filter length 64. Besides, the proposed structure involves nearly 30% saving in the iteration period over the other for 16-bit coefficient word length. Application specific integrated circuit (ASIC) synthesis result shows that the proposed structure for block size 8 offers a saving of 48% area-delay product (ADP) and 53% energy per sample (EPS) over the existing DA-based BLMS ADF structure on average for different filter lengths, and offers 30% higher sampling rate due to its shorter iteration period. Compared with the existing DA-based LMS ADF structure, the proposed structure involves 68% less ADP and 1.6× less EPS.Index Terms-Adaptive filters (ADFs), block least mean square (BLMS), distributed arithmetic (DA), LMS algorithm.

show abstract

LoBA: A Leading One Bit Based Imprecise Multiplier for Efficient Image Processing

Garg

Patel

Dutt

2020

J Electron Test

View full text Add to dashboard Cite

Efficient very large‐scale integration architecture for variable length block least mean square adaptive filter

Mohanty¹,

Patel²

2015

IET signal process.

View full text Add to dashboard Cite

The authors made an analysis on computational complexity of block least mean square (BLMS) finite impulse response (FIR) filter and decompose the filter computation into M sub-filters, where M = N/L, N is the filter length and L is the block-size. The proposed decomposition scheme favours time-multiplexing the filtering computation and weightincrement term computation of BLMS algorithm. Using the proposed scheme, they have derived an efficient architecture for BLMS FIR filter. The proposed structure can be reconfigured for different filter lengths with negligible overhead complexity and it supports variable convergence factor. Besides, the proposed structure has 100% hardware utilisation efficiency and its register complexity is independent of block-size. Compared with recently proposed LMSbased FIR structure the proposed structure involves L times more multipliers, proportionately less adders and the same number of registers, and it offers L times higher throughput. Application specific integrated circuit (ASIC) synthesis results show that the proposed structure for block-size 4 and filter-length 64 involve 21.4% less area-delay product (ADP) and 26.6% less energy per sample (EPS) than those of the existing structure and offers 3.8 times higher throughput.

show abstract

Area–delay and energy efficient multi‐operand binary tree adder

Patel

Singhal

2020

IET Circuits, Devices & Systems

View full text Add to dashboard Cite

Here, the critical path of ripple carry adder (RCA)-based binary tree adder (BTA) is analysed to find the possibilities for delay minimisation. Based on the findings of the analysis, the new logic formulation and the corresponding design of RCA are proposed for the BTA. The comparison result shows that the proposed RCA design offers better efficiency in terms of area, delay and energy than the existing RCA. Using this RCA design, the BTA structure is proposed. The synthesis result reveals that the proposed 32-operand BTA provides the saving of 22.5% in area-delay product and 28.7% in energy-delay product over the recent Wallace tree adder which is the best among available multi-operand adders. The authors have also applied the proposed BTA in the recent multiplier designs to evaluate its performance. The synthesis result shows that the performance of multiplier designs improved significantly due to the use of proposed BTA. Therefore, the proposed BTA design can be a better choice to develop the area, delay and energy efficient digital systems for signal and image processing applications.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.