Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Videoto-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX(1) Face DB. Specifically, we make three contributions. First, we collect and release a largescale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation.
Traffic sign recognition systems have been applied to advanced driving assistance and automatic driving systems to help drivers obtain important road information accurately. The current mainstream detection methods have high accuracy in this task, but the number of model parameters is large, and the detection speed is slow. Based on YOLOv5s as the basic framework, this paper proposes YOLOv5S-A2, which can improve the detection speed and reduce the model size at the cost of reducing the detection accuracy. Firstly, a data augmentation strategy is proposed by combining various operations to alleviate the problem of unbalanced class instances. Secondly, we proposed a path aggregation module of Feature Pyramid Network (FPN) to make new horizontal connections. It can enhance multi-scale feature representation capability and compensate for the loss of feature information. Thirdly, an attention detection head module is proposed to solve the aliasing effect in cross-scale fusion and enhance the representation of predictive features. Experiments on Tsinghua-Tencent 100K dataset (TT100K) show that our method can achieve more remarkable performance improvement and faster inference speed than other advanced technologies. Our method achieves 87.3% mean average precision (mAP) surpassing the original model's 7.9%, and the frames per second (FPS) value is maintained at 87.7. To show generality, we tested it on the German Traffic Sign Detection Benchmark (GTSDB) without tuning and obtained an average precision of 94.1%, and the FPS value is maintained at about 105.3. In addition, the number of YOLOv5s-A2 parameters is about 7.9 M.
This article carries on a preliminary discussion and attempt to the Uyghur source language's frame semantics description system and the content, narrates the composition of frame net according to the description content, conducts the description and the classification to the frame element's semantic role of modern Uyghur frame net, determines the semantic role labeling system, lays the good foundation for the Uyghur framenet syntax and semantics recognition and the analysis. It also explores a feasible method and the mentality for the foundation Uyghur framenet based on the cognition.
Object detection, which aims to automatically mark the coordinates of objects of interest in pictures or videos, is an extension of image classification. In recent years, it has been widely used in intelligent traffic management, intelligent monitoring systems, military object detection, and surgical instrument positioning in medical navigation surgery, etc. COVID-19, a novel coronavirus outbreak at the end of 2019, poses a serious threat to public health. Many countries require everyone to wear a mask in public to prevent the spread of coronavirus. To effectively prevent the spread of the coronavirus, we present an object detection method based on single-shot detector (SSD), which focuses on accurate and real-time face masks detection in the supermarket. We make contributions in the following three aspects: 1) presenting a lightweight backbone network for feature extraction, which based on SSD and spatial separable convolution, aiming to improve the detection speed and meet the requirements of real-time detection; 2) proposing a Feature Enhancement Module (FEM) to strengthen the deep features learned from CNN models, aiming to enhance the feature representation of the small objects; 3) constructing COVID-19-Mask, a large-scale dataset to detect whether shoppers are wearing masks, by collecting images in two supermarkets. The experiment results illustrate the high detection precision and realtime performance of the proposed algorithm.
To accurately identify apples from the complex background in the natural environment and help the apple harvesting robot harvest apples accurately, an improved apple image segmentation algorithm based on Deeplabv3 framework is proposed, which is named as AppleDNet. Using the famed Deeplabv3 algorithm, combined with Atrous convolution, Depthwise separable convolution and transfer learning, not only can achieve more accurate segmentation results but also improve segmentation speed. In addition, the traditional image filtering algorithm is adopted to obtain a smoother segmentation image and effectively eliminate the image stitching trace. The experimental results demonstrate that the performance of the proposed method is superior to the original Deeplabv3 and other popular mainstream image segmentation algorithms, with an overall accuracy of 97.90%, which has certain significance and advantages in practice.
The mainstream algorithm model structure based on deep neural networks is relatively simple, and it is easy to lose effective information in the pooling layer, the recognition accuracy is not high, and the recognition speed is slow. To solve this problem, a fusion convolutional attention mechanism and AlexNet’s Finger language recognition algorithm (ATTAlexNet). By introducing the convolutional attention mechanism in the AlexNet network, it can effectively perform feature learning, realize feature screening, enhance the characterization ability of the network, and introduce the AdderNet to replace the multiplication operation of the convolution layer in the AlexNet network, enhance the robustness of the network, and improve the network calculation speed. The experimental results show that ATTAlexNet is superior to other comparison algorithms, and under the same experimental conditions, the recognition rate of ATTAlexNet is increased by 2.0%, which proves that the ATTAlexNet algorithm can effectively realize finger language recognition, has fast calculation speed, and good robustness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.