This paper proposes a new ASIFT hardware architecture that processes a Video Graphics Array (VGA)-sized (640 × 480) video in real time. The previous ASIFT accelerator suffers from low utilization because affine transformed images are computed repeatedly. In order to improve hardware utilization, the proposed hardware architecture adopts two schemes to increase the utilization of a bottleneck hardware module. The first is a prior anti-aliasing scheme, and the second is a prior down-scaling scheme. In the proposed method, 1 × 1 and 0.5 × 1 blurred images are generated and they are reused for creating various affine transformed images. Thanks to the proposed schemes, the utilization drop by waiting for the affine transform is significantly decreased, and consequently, the operation speed is increased substantially. Experimental results show that the proposed ASIFT hardware accelerator processes a VGA-sized video at the speed of 28 frames/s, which is 1.36 times faster than that of previous work. and stores it in an external memory. By reusing the stored image for generating various simulated images, redundant data fetching for generating the 1 × 1 blurred image is removed. The second is a prior down-scaling scheme. A 0.5 × 1 blurred image is generated and reused for generating the simulated images of which the width is scaled less than 0.5 times. A word of the 0.5 × 1 blurred image includes more valid pixels than that of the 1 × 1 blurred image. Thus, the stall cycles to wait for valid data are decreased. As a result, the proposed ASIFT hardware implementation processes a VGA-sized video at 28 fps.
Previous Work
ASIFT AlgorithmAn ASIFT algorithm is proposed to achieve full affine invariance such that it can find correspondences in two images representing the same scene even though they are obtained from any viewpoints [2]. In an ASIFT algorithm, simulated images for various camera viewpoints are generated by transforming a source image with affine transform matrices. Then, SIFT features are computed in the simulated images. Because these SIFT features are obtained by considering the viewpoint change, correspondences can be found between two images for which the camera viewpoints are different.The images captured by a camera at various positions can be interpreted as affine decomposition. The camera position is represented on hemispherical coordinates as shown in Figure 1. The center (o) of the hemisphere is located at the center of a source image u. The latitude and longitude of the position of the camera are represented by θ and ϕ, respectively. The affine distortion caused by the change of the camera position is interpreted as the rotation and scaling of an image. The affine transform is represented by Equation (1). In this equation, image rotation and scaling are represented by a rotation matrix (R ϕ ) and a scaling matrix (T 1,1/t ), respectively. A = T 1,1/t R ϕ = 1 0 0 1/t
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.