Object detection is the most important problem in computer vision tasks. After AlexNet proposed, based on Convolutional Neural Network (CNN) methods have become mainstream in the computer vision field, many researches on neural networks and different transformations of algorithm structures have appeared. In order to achieve fast and accurate detection effects, it is necessary to jump out of the existing CNN framework and has great challenges. Transformer’s relatively mature theoretical support and technological development in the field of Natural Language Processing have brought it into the researcher’s sight, and it has been proved that Transformer’s method can be used for computer vision tasks, and proved that it exceeds the existing CNN method in some tasks. In order to enable more researchers to better understand the development process of object detection methods, existing methods, different frameworks, challenging problems and development trends, paper introduced historical classic methods of object detection used CNN, discusses the highlights, advantages and disadvantages of these algorithms. By consulting a large amount of paper, the paper compared different CNN detection methods and Transformer detection methods. Vertically under fair conditions, 13 different detection methods that have a broad impact on the field and are the most mainstream and promising are selected for comparison. The comparative data gives us confidence in the development of Transformer and the convergence between different methods. It also presents the recent innovative approaches to using Transformer in computer vision tasks. In the end, the challenges, opportunities and future prospects of this field are summarized.
Abstract-Estimation of body size using related dataset is one of the challenging tasks in modeling and simulation area. It has a wide range of utilization in many aspects in society including body modeling, designing clothes etc. It is necessary to extract feature data in body size measurement. Except ordinary data such as the body height, weight and chest size, the body surface area and volume were made up B-Spline surface shape model by linear combined with these data in this paper. Several control points were selected after using Genetic Algorithm (GA) to select best points in theses dataset, and mathematical model of estimating human body size was created. Experimental results indicated that this model has advantages of high efficiency and low error rates in estimating human body size.Keywords-genetic algorithm: GA; surface area; the control points of b-spline surface
Lipreading refers to recognizing the speaker's speech content through the image sequence of lip movement without the speech signal. Currently, most models use a spatiotemporal (3D) convolutional layer combined with 2D CNN to extract spatial and temporal features from image sequences. However, compared with 2D convolutional layers, which can extract fine-grained spatial features from the spatial domain, the single-layer 3D convolutional layer used in the model cannot extract temporal information well. This point is improved in this paper. Firstly, the Time Shift Module (TSM) is applied to two different front-ends (full 2D CNN based and mixture of 2D and 3D convolution) to enhance the ability of time information extraction. Secondly, the influence of different shift proportion of TSM and different sampling interval input on extracting time information is verified. Thirdly, the influence of different time shifts on the ability of spatiotemporal feature extraction is compared. The proposed method verified on two challenging word-level lipreading datasets LRW and LRW-1000 and achieved new state-of-theart performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.