A scalable massively parallel processor for real-time image processing

Kurafuji,; Haraguchi,; Nakajima,; Gyoten,; Nishijima,; Yamasaki,; Imai,; Ishizaki,; Kumaki,; Okuno, Hiroshi G.; Koide,; Mattausch,; Arimoto,

doi:10.1109/isscc.2010.5433910

Cited by 22 publications

(11 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A total 31-IP multi-core processor consumes Table II lists the comparison of four vision processors which have similar vision applications with this work. As compared with four architectures, namely, CMOS sensor integrated camera chip [28], a massively parallel image processor [29] and our previous arts [30], [8], this work reduces at least 51.5%, 14.8%, 54.6% and 49.3% power efficiency (GOPS/W) respectively. Thanks to the 5-stage fine-grain pipeline and SMT-enabled multi-core architecture, this chip obtains 1.5 times higher GOPS, which is 342 GOPS, even with 18% reduced gate counts compared to our latest work [8].…”

Section: A Chip Summarymentioning

confidence: 97%

A 320 mW 342 GOPS Real-Time Dynamic Object Recognition Processor for HD 720p Video Streams

Kim

Park

et al. 2013

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

Abstract-A heterogeneous multi-core processor is proposed to achieve real-time dynamic object recognition on HD 720p video streams. The context-aware visual attention model is proposed to reduce the required computing power for HD object recognition based on enhanced attention accuracy. In order to realize real-time execution of the proposed algorithm, the processor adopts a 5-stage task-level pipeline that maximizes the utilization of its 31 heterogeneous cores, comprising four simultaneous multithreading feature extraction clusters, a cache-based feature matching processor and a machine learning engine. Dynamic resource management is applied to adaptively tune thread allocation and power management during execution based on the detected amount of tasks and hardware utilization to increase energy efficiency. As a result, the 32 mm chip, fabricated in 0.13 m CMOS technology, achieves 30 frame/sec with 342 8-bit GOPS peak performance and 320 mW average power dissipation, which are a 2.72 times performance improvement and 2.54 times per-pixel energy reduction compared to the previous state-of-the-art.Index Terms-Multi-core processor, object recognition, scale invariant feature transform, heterogeneous, low power processor, dynamic resource management, dynamic voltage and frequency scaling. GLOSSARY OF ABBREVIATIONS GOPS

show abstract

Section: A Chip Summarymentioning

confidence: 97%

A 320 mW 342 GOPS Real-Time Dynamic Object Recognition Processor for HD 720p Video Streams

Kim

Park

et al. 2013

IEEE J. Solid-State Circuits

View full text Add to dashboard Cite

show abstract

“…5, the plain text of the block-cipher algorithm is normally represented in two-dimensional array format [24], because conventional processors are limited to a singleaccess data width, such as 16 or 32 bit. On the other hand, the proposed SIMD matrix processing module has a flexible data width architecture up to 256 or 512 bits [11], [16]. The massive-parallel memory-embedded SIMD matrix can therefore adopt one-dimensional line format to take advantage of its highly parallelism.…”

Section: Efficient Aes Processing With Simd Matrix Processormentioning

confidence: 99%

“…7 shows the example of transforming the 8-bit data {FF} 16 and {FE} 16 into {16} 16 and {BB} 16 , respectively. The transformation procedure starts with addition operations of 1 to all 8-bit data parts, e.g.…”

Section: Subbytes and Invsubbytes Transformationsmentioning

confidence: 99%

“…The transformation procedure starts with addition operations of 1 to all 8-bit data parts, e.g. {FF} 16 becomes {00} 16 due to an overflow, and uses the ALU, the valid flag and the carry flag in each processing element (PE).…”

Section: Subbytes and Invsubbytes Transformationsmentioning

confidence: 99%

“…The massive-parallel memory-embedded SIMD matrix has been proposed as a novel SIMD multimedia processor, which provides a better way for processing several types of multimedia applications [11]- [16]. It achieves highly parallel processing with low power consumption, and can thus target the mobile product applications.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Software-Based Parallel Cryptographic Solution with Massive-Parallel Memory-Embedded SIMD Matrix Architecture for Data-Storage Systems

Kumaki

Koide

Mattausch

et al. 2011

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

SUMMARYThis paper presents a software-based parallel cryptographic solution with a massive-parallel memory-embedded SIMD matrix (MTX) for data-storage systems. MTX can have up to 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. Furthermore, a next-generation SIMD matrix called MX-2 has been developed by expanding processing-element capability of MTX from 2-bit to 4-bit processing. These SIMD matrix architectures are verified to be a better alternative for processing repeated-arithmetic and logical-operations in multimedia applications with low power consumption. Moreover, we have proposed combining Content Addressable Memory (CAM) technology with the massive-parallel memory-embedded SIMD matrix architecture to enable fast pipelined table-lookup coding. Since both arithmetic logical operation and table-lookup coding execute extremely fast on these architectures, efficient execution of encryption and decryption algorithms can be realized. Evaluation results of the CAMless and CAM-enhanced massive-parallel SIMD matrix processor for the example of the Advanced Encryption Standard (AES), which is a widelyused cryptographic algorithm, show that a throughput of up to 2.19 Gbps becomes possible. This means that several standard data-storage transfer specifications, such as SD, CF (Compact Flash), USB (Universal Serial Bus) and SATA (Serial Advanced Technology Attachment) can be covered. Consequently, the massive-parallel SIMD matrix architecture is very suitable for private information protection in several data-storage media. A further advantage of the software based solution is the flexible update possibility of the implemented-cryptographic algorithm to a safer future algorithm. The massive-parallel memory-embedded SIMD matrix architecture (MTX and MX-2) is therefore a promising solution for integrated realization of real-time cryptographic algorithms with low power dissipation and small Si-area consumption. key words: matrix-processing architecture, SIMD, bit-serial and wordparallel, CAM, table-lookup coding, cryptographic algorithm, AES

show abstract

Low Power Multicore Processors for Embedded Systems

Arakawa

2012

Embedded Systems

View full text Add to dashboard Cite

A scalable massively parallel processor for real-time image processing

Cited by 22 publications

References 3 publications

A 320 mW 342 GOPS Real-Time Dynamic Object Recognition Processor for HD 720p Video Streams

A 320 mW 342 GOPS Real-Time Dynamic Object Recognition Processor for HD 720p Video Streams

Software-Based Parallel Cryptographic Solution with Massive-Parallel Memory-Embedded SIMD Matrix Architecture for Data-Storage Systems

Low Power Multicore Processors for Embedded Systems

Contact Info

Product

Resources

About