Design of a 2D DCT/IDCT application specific VLIW processor supporting scaled and sub-sampled blocks

Krishnan, Rohini; Gangwal, Om Prakash; Eijndhoven, J. van; Kumar, Anshul

doi:10.1109/icvd.2003.1183133

Cited by 9 publications

(5 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Therefore, a DCT coprocessor can use the same data path for the implementation of inverse and forward DCT. Moreover, implementing dedicated multipliers with fixed constants provides high throughput at minimal hardware cost [86] [133]. The internal state of this coprocessor is void after processing a DCT block.…”

Section: Discrete Cosine Transform (Dct)mentioning

confidence: 99%

“…A previous hardware implementation shows a full 8x8 two-dimensional DCT takes 64 cycles, or 32 cycles if the horizontal and vertical DCT are pipelined. Krishnan [86] gives more detailed performance figures for a DCT with embedded block compression. Assuming a best-case latency of 4 cycles to read a DCT row via the shell, reading a full DCT block takes 32 cycles (8 rows * 4 cycles/row).…”

Section: Coprocessor Performancementioning

confidence: 99%

“…Consequently, the coprocessor can switch tasks on a block-level granularity without requiring additional hardware for state save/restore. The DCT area estimation of 0.9 mm 2 in CMOS18 is based on an implementation using A|RT designer [86], with added area for the connection to the coprocessor shell.…”

Section: Discrete Cosine Transform (Dct)mentioning

confidence: 99%

See 2 more Smart Citations

Eclipse : Flexible Media Processing in a Heterogeneous Multiprocessor Template

Rutten¹

2007

View full text Add to dashboard Cite

All rights reserved. Without limiting the rights under copyright reserved above, no part of this book may be reproduced, stored in or introduced into a retrieval system, or transmitted, in any form or by any means (electronic, mechanical, photocopying, recording or otherwise) without the written permission of both the copyright owner and the author of the book. Acknowledgmentith this thesis, I stand on the shoulders of two remarkable architects, Evert-Jan Pol and Jos van Eijndhoven. Ever so often, I would study a certain aspect of the architecture. Typically, when I proudly presented my solution after careful consideration of all options, Evert-Jan would politely point out to me that I had overlooked a detail that rendered my solution utterly useless. However, my ideas triggered him to change one detail and invent the final solution, letting me wonder why I couldn't have come up with that. Next, Jos van Eijndhoven would pass by as an interested onlooker and within minutes sketch a highly innovative implementation on the white board. Jos and Evert-Jan: a very special thanks for all the support, coaching, and friendship. Evert-Jan, I am extremely grateful for all the architecting experience you taught me over the many years we worked together, and particularly for helping me to reverse engineer the design decisions in the EDDI use cases after a few years. W Clearly, this thesis builds on the work of many other researchers: Egbert Jaspers and Erik van der Tol, showing me what MPEG-2 really means; Pieter van der Wolf-who was one of the first to stress that Eclipse had a value far beyond a TriMedia accelerator. Pieter, thanks for teaching me how to properly structure a document and for all the last-minute reviewing of Eclipse papers, typically a few hours before the submission deadline. Thanks to John Moors, struggling to decrypt my coprocessor shell design document during RTL implementation. Karel Walters, your enthusiasm and skill in developing the Eclipse control software was inspiring! I will never forget the enthusiasm of the Semiconductors and Research teams in India. Ferry, thanks for all the times you rescued me from spending all weekend typing on my thesis in Bangalore! My gratitude goes out to Bob Hertzberger for supporting this thesis since 2002. Jacqueline, thank you for your help and support during these years! I am deeply indebted to my parents for virtually everything, with a special thanks to my father for reviewing and kayaking with me whenever it's too cold to stay home. Last but not least, a huge thanks to Katrien, Open Office guru Andrei, Jaap, Maca, Nanni, Enith, Giuseppe, Loes, Wouter, Joep, Joost, Csaba, Thibaut, Alex, Vedran, Andreja, Mathias, Clara, and Jaques for always asking when I would finally finish my thesis and at the same time inviting me to do other fun things… viii Acknowledgment I will never have enough pages to thank you, Derya, for being so sweet despite all the time I spent behind my laptop and left you alone…tatlı karıcım, sevgin ve desteğin için çok teşekkurler!

show abstract

Section: Discrete Cosine Transform (Dct)mentioning

confidence: 99%

Section: Coprocessor Performancementioning

confidence: 99%

See 1 more Smart Citation

Eclipse : Flexible Media Processing in a Heterogeneous Multiprocessor Template

Rutten¹

2007

View full text Add to dashboard Cite

show abstract

“…It was selected due the minimum required number of additions and multiplications (11 Mul and 29 add). This algorithm is obtained by a slight modification of the original Loeffler algorithm [9], which provides one of the most computationally efficient 1-D DCT/IDCT calculations [20]. The modified Loeffler algorithm for calculating 8-point 1-D DCT is illustrated in Figure 3.…”

Section: Loeffler Algorithm For the 1d-dctmentioning

confidence: 99%

Optimization and Implementation on Fpga of the DCT/IDCT Algorithm

Atitallah¹,

Kadionik²,

Ghozzi³

et al.

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings

View full text Add to dashboard Cite

In this paper, we present a comparison between two methods, the modified Loeffler algorithm (11 MUL and 29 ADD) and Distributed Arithmetic, to implement the DCT/IDCT algorithm for MPEG or H.26x video compression using VHDL description language. The implementation has been achieved on Altera Stratix EP1S10 FPGA which provides a dedicated DSP blocks required for common signal processing functions. A new solution based on this DSP blocks used for to implement multipliers for the modified Loeffler algorithm in order to optimize speed and area.

show abstract

“…) αλά δεπηεξφιεπην. Ζ πινπνίεζε γίλεηαη βάζε ηνπ αιγνξίζκνπ ηνπ Loeffler[53], φπσο παξνπζηάδεηαη ζην ρ. 3.43, ελψ γηα ηνλ αληίζηξνθν κεηαζρεκαηηζκφ παξαιείπνληαη νη γξακκέο πνπ πεξηέρνπλ κφλν κεδεληθά.Οη O. Cadenas, M. Brandt, G. Megson θαη N. Goswami[12] παξνπζηάδνπλ κηα δνκή πινπνίεζεο ηνπ 8x8 2-D DCT / 2-D ΗDCT ζε FPGA πνπ έρεη σο ζηφρν ηελ ρακειή θαηαλάισζε ηζρχνο.…”

unclassified

Σχεδίαση Προσαρμοστικών Και Δυναμικά Αναδιατάξιμων Αρχιτεκτονικών Αντιστρόφου Μετασχηματισμού Συνημιτόνου 8X8 2-D IDCT, Για Χαμηλή Κατανάλωση Ισχύος

Tziortzios¹

View full text Add to dashboard Cite

Αντικείμενο της παρούσας διατριβής είναι η μελέτη και ανάπτυξη αρχιτεκτονικών Αντιστρόφου Διακριτού Μετασχηματισμού Συνημιτόνου (Inverse Discrete Cosine Transform, 8×8 2-D IDCT). Κύριος σκοπός της έρευνας είναι η μελέτη και ανάπτυξη αρχιτεκτονικών για χαμηλή κατανάλωση ισχύος.Συνολικά παρουσιάζονται 11 αρχιτεκτονικές υπολογισμού του IDCT και μία αρχιτεκτονική υπολογισμού του ευθέως μετασχηματισμού (DCT).Οι 8 από τις αρχιτεκτονικές έχουν ως βάση τους έναν ή περισσότερους Συστολικούς Πίνακες Επεξεργαστών. Μάλιστα, οι 2 από τις αρχιτεκτονικές IDCT και η μία αρχιτεκτονική του ευθέως μετασχηματισμού DCT χρησιμοποιούν ασύγχρονα θεμελιώδη υπολογιστικά στοιχεία. Οι υπόλοιπες 5 λύσεις έχουν ως βάση τους τα σύγχρονα θεμελιώδη Υπολογιστικά Στοιχεία. Σε κάθε περίπτωση, η συμμετρία που ενυπάρχει στον πυρήνα του μετασχηματισμού αξιοποιείται, προκειμένου να ελαττωθεί η απαιτούμενη επιφάνεια κυκλώματος, οι απαιτούμενες αριθμητικές πράξεις και να αυξηθεί η ταχύτητα των υπολογισμών. Προκύπτει πως η εκμετάλλευση της συμμετρίας έχει ως αποτέλεσμα την μείωση της κατανάλωσης ενέργειας που απαιτείται για την επεξεργασία συγκεκριμένου όγκου δεδομένων.Οι 3 από τις αρχιτεκτονικές 8×8 2-D IDCT βασίζονται στον αλγόριθμο των Arai-Agui-Nakajima. Στη μία από αυτές η ρυθμαπόδοση αυξάνεται μέσω της τεχνικής της διοχέτευσης. Η κατανάλωση ισχύος μειώνεται μέσω της σταδιακής απενεργοποίησης τμημάτων του κυ-κλώματος, βάσει του πλήθους των μηδενικών τιμών του σήματος εισόδου. Οι δύο υπόλοιπες αρχιτεκτονικές χρησιμοποιούν την Αλγεβρικά Ακέραιη Κωδικοποίηση προκειμένου να αποφευχθούν οι πολλαπλασιασμοί στον πυρήνα του μετασχηματισμού.Η «ενδέκατη» αρχιτεκτονική 8×8 2-D IDCT βασίζεται στη μεγάλη πιθανότητα ύπαρξης μηδενικών συντελεστών DCT και αξιοποιεί τη συμμετρία που ενυπάρχει στις μήτρες βάσης του μετασχηματισμού. Ο χρόνος ανακατασκευής είναι μεταβλητός κι εξαρτάται από το πλήθος των μη μηδενικών συντελεστών. Η συγκεκριμένη αρχιτεκτονική έχει το μικρότερο πλήθος πολλαπλασιασμών ανά μη μηδενικό συντελεστή που έχει αναφερθεί στη βιβλιογραφία.Όσον αφορά στην κατανάλωση ισχύος, παρουσιάζεται ένας αλγόριθμος για την κατα-μέτρηση των ενεργοβόρων εναλλαγών κατάστασης στους κόμβους των κυκλωμάτων CMOS. Με βάση τον αλγόριθμο αυτό, μελετάται η κατανομή της δυναμικής κατανάλωσης ισχύος για δύο διαφορετικές αρχιτεκτονικές υπολογισμού του IDCT (Lee και Chen) και για δύο συστή-ματα αριθμητικής αναπαράστασης (συμπλήρωμα του 2 και πρόσημο-μέτρο). Τα αποτελέ-σματα παρουσιάζονται σε διάφορα επίπεδα παρατήρησης ξεκινώντας από το επίπεδο του συνολικού συστήματος και καταλήγοντας μέχρι και το επίπεδο RTL (Register Transfer Le-vel), που στη συγκεκριμένη διατριβή αντιστοιχεί στον πλήρη αθροιστή. Τέλος, έχοντας ως σκοπό την εξοικονόμηση ισχύος, προτείνεται μια αρχιτεκτονική όπου στους πολλαπλασια-στές οι τελεστέοι αναπαρίστανται στη μορφή πρόσημο-μέτρο, ενώ στους αθροιστές κι αφαι-ρέτες οι τελεστέοι είναι στη μορφή του συμπληρώματος του 2.

show abstract

Design of a 2D DCT/IDCT application specific VLIW processor supporting scaled and sub-sampled blocks

Abstract: We present an innovative design of an accurate, 2D DCT IDCT processor, which handles scaled and sub-sampled input

Cited by 9 publications

References 10 publications

Eclipse : Flexible Media Processing in a Heterogeneous Multiprocessor Template

Eclipse : Flexible Media Processing in a Heterogeneous Multiprocessor Template

Optimization and Implementation on Fpga of the DCT/IDCT Algorithm

Σχεδίαση Προσαρμοστικών Και Δυναμικά Αναδιατάξιμων Αρχιτεκτονικών Αντιστρόφου Μετασχηματισμού Συνημιτόνου 8X8 2-D IDCT, Για Χαμηλή Κατανάλωση Ισχύος

Contact Info

Product

Resources

About