In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base. More specifically, we propose a benchmark task to recognize one million celebrities from their face images, by using all the possibly collected face images of this individual on the web as training data. The rich information provided by the knowledge base helps to conduct disambiguation and improve the recognition accuracy, and contributes to various real-world applications, such as image captioning and news video analysis. Associated with this task, we design and provide concrete measurement set, evaluation protocol, as well as training data. We also present in details our experiment setup and report promising baseline results. Our benchmark task could lead to one of the largest classification problems in computer vision. To the best of our knowledge, our training dataset, which contains 10M images in version 1, is the largest publicly available one in the world.
Pavement crack detection is a critical task for insuring road safety. Manual crack detection is extremely timeconsuming. Therefore, an automatic road crack detection method is required to boost this progress. However, it remains a challenging task due to the intensity inhomogeneity of cracks and complexity of the background, e.g., the low contrast with surrounding pavements and possible shadows with similar intensity. Inspired by recent advances of deep learning in computer vision, we propose a novel network architecture, named Feature Pyramid and Hierarchical Boosting Network (FPHBN), for pavement crack detection. The proposed network integrates context information to low-level features for crack detection in a feature pyramid way. And, it balances the contributions of both easy and hard samples to loss by nested sample reweighting in a hierarchical way during training. In addition, we propose a novel measurement for crack detection named average intersection over union (AIU). To demonstrate the superiority and generalizability of the proposed method, we evaluate it on five crack datasets and compare it with state-of-the-art crack detection, edge detection, and semantic segmentation methods. Extensive experiments show that the proposed method outperforms these methods in terms of accuracy and generalizability. Code and data can be found in https://github.com/fyangneil/pavement-crack-detection
Recently the sparse representation (or coding) based classification (SRC)
Abstract:Recently the sparse representation based classification (SRC) has been proposed for robust face recognition (FR). In SRC, the testing image is coded as a sparse linear combination of the training samples, and the representation fidelity is measured by the l 2 -norm or l 1 -norm of the coding residual. Such a sparse coding model assumes that the coding residual follows Gaussian or Laplacian distribution, which may not be effective enough to describe the coding residual in practical FR systems. Meanwhile, the sparsity constraint on the coding coefficients makes SRC's computational cost very high. In this paper, we propose a new face coding model, namely regularized robust coding (RRC), which could robustly regress a given signal with regularized regression coefficients. By assuming that the coding residual and the coding coefficient are respectively independent and identically distributed, the RRC seeks for a maximum a posterior solution of the coding problem. An iteratively reweighted regularized robust coding (IR 3 C) algorithm is proposed to solve the RRC model efficiently. Extensive experiments on representative face databases demonstrate that the RRC is much more effective and efficient than state-of-the-art sparse representation based methods in dealing with face occlusion, corruption, lighting and expression changes, etc.
Abstract-In this paper, a completed modeling of the LBP operator is proposed and an associated completed LBP (CLBP) scheme is developed for texture classification. A local region is represented by its center pixel and a local difference sign-magnitude transform (LDSMT). The center pixels represent the image gray level and they are converted into a binary code, namely CLBP-Center (CLBP_C), by global thresholding. LDSMT decomposes the image local differences into two complementary components: the signs and the magnitudes, and two operators, namely CLBP-Sign (CLBP_S) and CLBP-Magnitude (CLBP_M), are proposed to code them. The traditional LBP is equivalent to the CLBP_S part of CLBP, and we show that CLBP_S preserves more information of the local structure than CLBP_M, which explains why the simple LBP operator can extract the texture features reasonably well. By combining CLBP_S, CLBP_M, and CLBP_C features into joint or hybrid distributions, significant improvement can be made for rotation invariant texture classification.
A novel object tracking algorithm is presented in this paper by using the joint colortexture histogram to represent a target and then applying it to the mean shift framework. Apart from the conventional color histogram features, the texture features of the object are also extracted by using the local binary pattern (LBP) technique to represent the object. The major uniform LBP patterns are exploited to form a mask for joint color-texture feature selection. Compared with the traditional color histogram based algorithms that use the whole target region for tracking, the proposed algorithm extracts effectively the edge and corner features in the target region, which characterize better and represent more robustly the target. The experimental results validate that the proposed method improves greatly the tracking accuracy and efficiency with fewer mean shift iterations than standard mean shift tracking. It can robustly track the target under complex scenes, such as similar target and background appearance, on which the traditional color based schemes may fail to track.
Face recognition (FR) is an active yet challenging topic in computer vision applications. As a powerful tool to represent high dimensional data, recently sparse representation based classification (SRC) has been successfully used for FR. This paper discusses the metaface learning (MFL) of face images under the framework of SRC. Although directly using the training samples as dictionary bases can achieve good FR performance, a well learned dictionary matrix can lead to higher FR rate with less dictionary atoms. An SRC oriented unsupervised MFL algorithm is proposed in this paper and the experimental results on benchmark face databases demonstrated the improvements brought by the proposed MFL algorithm over original SRC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.