Alifu Kuerban scite author profile

Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Videoto-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX(1) Face DB. Specifically, we make three contributions. First, we collect and release a largescale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation.

show abstract

Benchmarking Still-to-Video Face Recognition via Partial and Local Linear Discriminant Analysis on COX-S2V Dataset

Huang

Shan

Zhang

et al. 2013

View full text Add to dashboard Cite

Faster Light Detection Algorithm of Traffic Signs Based on YOLOv5s-A2

Kuerban

Chen

et al. 2023

IEEE Access

View full text Add to dashboard Cite

Traffic sign recognition systems have been applied to advanced driving assistance and automatic driving systems to help drivers obtain important road information accurately. The current mainstream detection methods have high accuracy in this task, but the number of model parameters is large, and the detection speed is slow. Based on YOLOv5s as the basic framework, this paper proposes YOLOv5S-A2, which can improve the detection speed and reduce the model size at the cost of reducing the detection accuracy. Firstly, a data augmentation strategy is proposed by combining various operations to alleviate the problem of unbalanced class instances. Secondly, we proposed a path aggregation module of Feature Pyramid Network (FPN) to make new horizontal connections. It can enhance multi-scale feature representation capability and compensate for the loss of feature information. Thirdly, an attention detection head module is proposed to solve the aliasing effect in cross-scale fusion and enhance the representation of predictive features. Experiments on Tsinghua-Tencent 100K dataset (TT100K) show that our method can achieve more remarkable performance improvement and faster inference speed than other advanced technologies. Our method achieves 87.3% mean average precision (mAP) surpassing the original model's 7.9%, and the frames per second (FPS) value is maintained at 87.7. To show generality, we tested it on the German Traffic Sign Detection Benchmark (GTSDB) without tuning and obtained an average precision of 94.1%, and the FPS value is maintained at about 105.3. In addition, the number of YOLOv5s-A2 parameters is about 7.9 M.

show abstract

Multi-Vision Network for Accurate and Real-Time Small Object Detection in Optical Remote Sensing Images

Han

Kuerban

Yang

et al. 2022

IEEE Geosci. Remote Sensing Lett.

View full text Add to dashboard Cite

Research on Uyghur framenet description system

Kuerban

Abdurusul

2009

View full text Add to dashboard Cite

This article carries on a preliminary discussion and attempt to the Uyghur source language's frame semantics description system and the content, narrates the composition of frame net according to the description content, conducts the description and the classification to the frame element's semantic role of modern Uyghur frame net, determines the semantic role labeling system, lays the good foundation for the Uyghur framenet syntax and semantics recognition and the analysis. It also explores a feasible method and the mentality for the foundation Uyghur framenet based on the cognition.

show abstract

A Mask Detection Method for Shoppers Under the Threat of COVID-19 Coronavirus

Han

Huang

Kuerban

et al. 2020

View full text Add to dashboard Cite

Object detection, which aims to automatically mark the coordinates of objects of interest in pictures or videos, is an extension of image classification. In recent years, it has been widely used in intelligent traffic management, intelligent monitoring systems, military object detection, and surgical instrument positioning in medical navigation surgery, etc. COVID-19, a novel coronavirus outbreak at the end of 2019, poses a serious threat to public health. Many countries require everyone to wear a mask in public to prevent the spread of coronavirus. To effectively prevent the spread of the coronavirus, we present an object detection method based on single-shot detector (SSD), which focuses on accurate and real-time face masks detection in the supermarket. We make contributions in the following three aspects: 1) presenting a lightweight backbone network for feature extraction, which based on SSD and spatial separable convolution, aiming to improve the detection speed and meet the requirements of real-time detection; 2) proposing a Feature Enhancement Module (FEM) to strengthen the deep features learned from CNN models, aiming to enhance the feature representation of the small objects; 3) constructing COVID-19-Mask, a large-scale dataset to detect whether shoppers are wearing masks, by collecting images in two supermarkets. The experiment results illustrate the high detection precision and realtime performance of the proposed algorithm.

show abstract

Research on Apple Image Segmentation in Natural Environment Based on Deep Learning

Huang¹,

Kuerban²,

Han³

2021

dtcse

View full text Add to dashboard Cite

To accurately identify apples from the complex background in the natural environment and help the apple harvesting robot harvest apples accurately, an improved apple image segmentation algorithm based on Deeplabv3 framework is proposed, which is named as AppleDNet. Using the famed Deeplabv3 algorithm, combined with Atrous convolution, Depthwise separable convolution and transfer learning, not only can achieve more accurate segmentation results but also improve segmentation speed. In addition, the traditional image filtering algorithm is adopted to obtain a smoother segmentation image and effectively eliminate the image stitching trace. The experimental results demonstrate that the performance of the proposed method is superior to the original Deeplabv3 and other popular mainstream image segmentation algorithms, with an overall accuracy of 97.90%, which has certain significance and advantages in practice.

show abstract

Finger language recognition algorithm based on ATTAlexnet

Kuerban

Abula

Han

2021

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

The mainstream algorithm model structure based on deep neural networks is relatively simple, and it is easy to lose effective information in the pooling layer, the recognition accuracy is not high, and the recognition speed is slow. To solve this problem, a fusion convolutional attention mechanism and AlexNet’s Finger language recognition algorithm (ATTAlexNet). By introducing the convolutional attention mechanism in the AlexNet network, it can effectively perform feature learning, realize feature screening, enhance the characterization ability of the network, and introduce the AdderNet to replace the multiplication operation of the convolution layer in the AlexNet network, enhance the robustness of the network, and improve the network calculation speed. The experimental results show that ATTAlexNet is superior to other comparison algorithms, and under the same experimental conditions, the recognition rate of ATTAlexNet is increased by 2.0%, which proves that the ATTAlexNet algorithm can effectively realize finger language recognition, has fast calculation speed, and good robustness.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alifu Kuerban

A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database

Benchmarking Still-to-Video Face Recognition via Partial and Local Linear Discriminant Analysis on COX-S2V Dataset

Faster Light Detection Algorithm of Traffic Signs Based on YOLOv5s-A2

Multi-Vision Network for Accurate and Real-Time Small Object Detection in Optical Remote Sensing Images

Research on Uyghur framenet description system

A Mask Detection Method for Shoppers Under the Threat of COVID-19 Coronavirus

Research on Apple Image Segmentation in Natural Environment Based on Deep Learning

Finger language recognition algorithm based on ATTAlexnet

Contact Info

Product

Resources

About