The coarse-grained reconfigurable image stream processor (CRISP) architecture is introduced for the image processing demands of high-definition (HD) cameras and camcorders. With several architectural concepts of the reconfigurable architecture, the CRISP architecture is proposed to meet the performance and flexibility requirements of the HD cameras. A multi-frame processing system with CRISP is implemented to achieve the real-time HD video recording and 11M-pixel image processing capability. Compared with the performance of the high-dynamic-range image fusion algorithm implemented with a general-purpose processor, 106 times speed-up is achieved by the proposed processor with high image quality of 42.5dB in PSNR.
Advances in semiconductors and developments in machine learning [1] have led to versatile multimedia applications with semantic processing abilities. Realtime applications, such as face detection, facial-expression recognition, scene analysis [2] and object recognition [3], have become indispensable functionality for Consumer Electronic (CE) products. To deal with complicated video-processing algorithms for multimedia content analysis, many powerful processors have been reported [2][3][4][5]. Although these processors can speed up video-processing tasks with massively parallel processing elements, they only focus on the feature-extraction parts, and there is no specialized hardware to support different kinds of advanced machine-learning algorithms, which require extensive computations. In this paper, a Semantic Analysis SoC (SASoC) that accelerates video processing and machine learning simultaneously, is developed to meet the demands of the near future.The SASoC is characterized as follows. (1 Processor. The VPU has a 3-level hierarchical architecture that can process 256 dimensions of vectors in parallel, and operations such as vector inner product, vector distance and exponential computation can be executed in 1 cycle. Each level of VPU has a Local Vector Memory (LVM) for rapid data access and supporting different operations and parallelism. The LVM of the Low-Level VPU and Input Vector Memory (IVM) provide 76.8GB/s bandwidth to Vector ALUs, and input vectors can be sent to different levels of the VPU according to application requirements. Connected to High-Level VPU, the K-NN Processor is designed for the computation of rankings of vector distances, and 128 PEs can sort and store the distances in the same clock cycle.Example applications based on the SASoC are illustrated in Figure 18.7.4. The first application is concept-based image retrieval, which adopts the concept categories to perform semantic analysis in images, and the real-time retrieval results can be used for scene recognition and photo classification in CE products. The color and texture features are extracted by OPU and LPU, respectively, and GMM-based classification can be accomplished using 3 levels of VPU. Finally, the K-NN Processor computes the nearest neighbor of the captured image and gives retrieval results with the frame rate of 156fps in 160×120 resolution. The second application is face detection, which is widely applied in DSCs and camcorders. After noise reduction from OPU, the Haar-like features are extracted by LPU and sent to the FSPS for classification. The 2 levels of VPU are used to execute the AdaBoost algorithm, and the results of face detection are stored in Output Vector Memory (OVM) with the frame rate of 294fps in 160×120 resolution.The performance analysis with different single-test operations of the ISPS and FSPS is shown in Figure 18.7.5. In the ISPS, the maximum input data rate is 76.8Gpixel/s when OPU and LPU work in pipeline, and the frame rate is 17,500× higher than the state-of-the-art PC when the frequency of...
Low-power wireless video sensor nodes play important roles for applications in machine-to-machine (M2M) network. Several design issues to optimize the power consumption of a video sensor node are addressed in this paper. For the video coding engine selection, the comparison between conventional video coding system and distributed video coding (DVC) system shows that although the rate-distortion performance of existing DVC codec still has room to improve, it can provide lower power consumption with a noisy transmission channel. Furthermore, it also demonstrated that video analysis unit can help to filter out video contents without event-of-interest to reduce transmission power. Finally, several future research directions are addressed, and the trade-off between the video analysis unit, video coding unit, and data transmission should be further studied to design wireless video sensors with optimized power consumption.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.