2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia 2009
DOI: 10.1109/estmed.2009.5336814
|View full text |Cite
|
Sign up to set email alerts
|

A high-throughput, area-efficient hardware accelerator for adaptive deblocking filter in H.264/AVC

Abstract: In this paper, we present a high-throughput, areaefficient, hardware accelerator for the deblocking filter in H.264/AVC video compression standard. In order to achieve this goal, we start with algorithmic optimization and propose a novel decomposition of the filter kernels for the deblocking filter. The proposed decomposition reduces the number of adders by 51% and thereby greatly reduces the area requirement for its implementation. Subsequently, at architecture level, while using two identical filtering units… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2010
2010
2015
2015

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…There exist numerous applications for accelarators in both of the embedded and high performance computing markets. Examples include video processing [24], software-defined radio [5], network traffic management [19], DNA computing [17] and fully programmable hardware acceleration platforms [23]. Efficient sharing of data in a heterogeneous MpSoC which contains different types of integrated computational elements is a challenging task.…”
Section: Introductionmentioning
confidence: 99%
“…There exist numerous applications for accelarators in both of the embedded and high performance computing markets. Examples include video processing [24], software-defined radio [5], network traffic management [19], DNA computing [17] and fully programmable hardware acceleration platforms [23]. Efficient sharing of data in a heterogeneous MpSoC which contains different types of integrated computational elements is a challenging task.…”
Section: Introductionmentioning
confidence: 99%
“…We proposed a novel decomposition of the filter kernels to remove the arithmetic operations redundancy in our previous work [17]. The proposed optimization of the filter equations reduces the total number of adder instances from 49 to 24 [17]. This more than double reduction of addition operations does not only pay off in terms of less area requirement for its implementation but also helps to reduce the signal activity in the combinatorial logic between different pipeline stages.…”
Section: Our Approach To Reduce the Dynamic Power Consumptionmentioning
confidence: 96%
“…(14) - (16) in Fig 3(c). Similarly, the overlapped data path for conditional filtering in strong and weak filtering modes is implemented in LumaCommonPBlock and LumaCommonQBlock whereas the LumaBs4_PBlock, LumaBs4_QBlock implements the rest of the processing for strong filter mode case [17]. In case of Strong or Weak Filter Modes for chroma component of the MB, one can see from Fig.…”
Section: Deblock Filter Core Modulementioning
confidence: 99%
See 1 more Smart Citation
“…A large number of accelerators utilize a single filter core, such as [4][5][6][7][8][9][10][11][12][13][14]. Most single filter based architecture [4][5][6] can operate at 100MHz and take around 200 cycles per macroblock, which can not satisfy high level requirement.…”
Section: Introductionmentioning
confidence: 99%