Objects are often organized in a semantic hierarchy of categories, where finelevel categories are grouped into coarse-level categories according to their semantic relations. While previous works usually only classify objects into the leaf categories, we argue that generating hierarchical labels can actually describe how the leaf categories evolved from higher level coarse-grained categories, thus can provide a better understanding of the objects. In this paper, we propose to utilize the CNN-RNN framework to address the hierarchical image classification task. CNN allows us to obtain discriminative features for the input images, and RNN enables us to jointly optimize the classification of coarse and fine labels. This framework can not only generate hierarchical labels for images, but also improve the traditional leaf-level classification performance due to incorporating the hierarchical information. Moreover, this framework can be built on top of any CNN architecture which is primarily designed for leaf-level classification. Accordingly, we build a high performance network based on the CNN-RNN paradigm which outperforms the original CNN (wider-ResNet) and also the current state-of-the-art. In addition, we investigate how to utilize the CNN-RNN framework to improve the fine category classification when a fraction of the training data is only annotated with coarse labels. Experimental results demonstrate that CNN-RNN can use the coarse-labeled training data to improve the classification of fine categories, and in some cases it even surpasses the performance achieved by fully annotated training data. This reveals that, CNN-RNN can alleviate the challenge of specialized and expensive annotation of fine labels.
Image-based kinship recognition is an important problem in the reconstruction and analysis of social networks. Prior studies on image-based kinship recognition have focused solely on pairwise kinship verification, i.e. on the question of whether or not two people are kin. Such approaches fail to exploit the fact that many real-world photographs contain several family members; for instance, the probability of two people being brothers increases when both people are recognized to have the same father. In this work, we propose a graph-based approach that incorporates facial similarities between all family members in a photograph in order to improve the performance of kinship recognition. In addition, we introduce a database of group photographs with kinship annotations.
Wood anatomy is one of the most important methods for timber identification. However, training wood anatomy experts is time-consuming, while at the same time the number of senior wood anatomists with broad taxonomic expertise is declining. Therefore, we want to explore how a more automated, computer-assisted approach can support accurate wood identification based on microscopic wood anatomy. For our exploratory research, we used an available image dataset that has been applied in several computer vision studies, consisting of 112 — mainly neotropical — tree species representing 20 images of transverse sections for each species. Our study aims to review existing computer vision methods and compare the success of species identification based on (1) several image classifiers based on manually adjusted texture features, and (2) a state-of-the-art approach for image classification based on deep learning, more specifically Convolutional Neural Networks (CNNs). In support of previous studies, a considerable increase of the correct identification is accomplished using deep learning, leading to an accuracy rate up to 95.6%. This remarkably high success rate highlights the fundamental potential of wood anatomy in species identification and motivates us to expand the existing database to an extensive, worldwide reference database with transverse and tangential microscopic images from the most traded timber species and their look-a-likes. This global reference database could serve as a valuable future tool for stakeholders involved in combatting illegal logging and would boost the societal value of wood anatomy along with its collections and experts.
High-throughput imaging is applied to provide observations for accurate statements on phenomena in biology and this has been successfully applied in the domain of cells, i.e. cytomics. In the domain of whole organisms, we need to take the hurdles to ensure that the imaging can be accomplished with a sufficient throughput and reproducibility. For vertebrate biology, zebrafish is a popular model system for high-throughput applications. The development of the Vertebrate Automated Screening Technology (VAST BioImager), a microscope mounted system, enables the application of zebrafish high-throughput screening. The VAST BioImager contains a capillary that holds a zebrafish for imaging. Through the rotation of the capillary, multiple axial-views of a specimen can be acquired. For the VAST BioImager, fluorescence and/or confocal microscopes are used. Quantitation of a specific signal as derived from a label in one fluorescent channel requires insight in the zebrafish volume to be able to normalize quantitation to volume units. However, from the setup of the VAST BioImager, a specimen volume cannot be straightforwardly derived. We present a high-throughput axial-view imaging architecture based on the VAST BioImager. We propose profile-based 3D reconstruction to produce 3D volumetric representations for zebrafish larvae using the axial-views. Volume and surface area can then be derived from the 3D reconstruction to obtain the shape characteristics in high-throughput measurements. In addition, we develop a calibration and a validation of our methodology. From our measurements we show that with a limited amount of views, accurate measurements of volume and surface area for zebrafish larvae can be obtained. We have applied the proposed method on a range of developmental stages in zebrafish and produced metrical references for the volume and surface area for each stage.
To solve the oversampling problem of multi-class small samples and to improve their classification accuracy, we develop an oversampling method based on classification ranking and weight setting. The designed oversampling algorithm sorts the data within each class of dataset according to the distance from original data to the hyperplane. Furthermore, iterative sampling is performed within the class and inter-class sampling is adopted at the boundaries of adjacent classes according to the sampling weight composed of data density and data sorting. Finally, information assignment is performed on all newly generated sampling data. The training and testing experiments of the algorithm are conducted by using the UCI imbalanced datasets, and the established composite metrics are used to evaluate the performance of the proposed algorithm and other algorithms in comprehensive evaluation method. The results show that the proposed algorithm makes the multi-class imbalanced data balanced in terms of quantity, and the newly generated data maintain the distribution characteristics and information properties of the original samples. Moreover, compared with other algorithms such as SMOTE and SVMOM, the proposed algorithm has reached a higher classification accuracy of about 90%. It is concluded that this algorithm has high practicability and general characteristics for imbalanced multi-class samples.
The technology of Artificial Intelligence (AI) brings tremendous possibilities for autonomous vehicle applications. One of the essential tasks of autonomous vehicle is environment perception using machine learning algorithms. Since the cyclists are the vulnerable road users, cyclist detection and tracking are important perception sub-tasks for autonomous vehicles to avoid vehicle-cyclist collision. In this paper, a robust method for cyclist detection and tracking is presented based on multi-layer laser scanner, i.e., IBEO LUX 4L, which obtains four-layer point cloud from local environment. First, the laser points are partitioned into individual clusters using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) method based on subarea. Then, 37-dimensional feature set is optimized by Relief algorithm and Principal Component Analysis (PCA) to produce two new feature sets. Support Vector Machine (SVM) and Decision Tree (DT) classifiers are further combined with three feature sets, respectively. Moreover, Multiple Hypothesis Tracking (MHT) algorithm and Kalman filter based on Current Statistical (CS) model are applied to track moving cyclists and estimate the motion state. The performance of the proposed cyclist detection and tracking method is validated in real road environment.
Background and Objective. Colorectal cancer (CRC) is a common gastrointestinal tumour with high morbidity and mortality. Endoscopic examination is an effective method for early detection of digestive system tumours. However, due to various reasons, missed diagnoses and misdiagnoses are common occurrences. Our goal is to use deep learning methods to establish colorectal lesion detection, positioning, and classification models based on white light endoscopic images and to design a computer-aided diagnosis (CAD) system to help physicians reduce the rate of missed diagnosis and improve the accuracy of the detection rate. Methods. We collected and sorted out the white light endoscopic images of some patients undergoing colonoscopy. The convolutional neural network model is used to detect whether the image contains lesions: CRC, colorectal adenoma (CRA), and colorectal polyps. The accuracy, sensitivity, and specificity rates are used as indicators to evaluate the model. Then, the instance segmentation model is used to locate and classify the lesions on the images containing lesions, and mAP (mean average precision), AP50, and AP75 are used to evaluate the performance of an instance segmentation model. Results. In the process of detecting whether the image contains lesions, we compared ResNet50 with the other four models, that is, AlexNet, VGG19, ResNet18, and GoogLeNet. The result is that ResNet50 performs better than several other models. It scored an accuracy of 93.0%, a sensitivity of 94.3%, and a specificity of 90.6%. In the process of localization and classification of the lesion in images containing lesions by Mask R-CNN, its mAP, AP50, and AP75 were 0.676, 0.903, and 0.833, respectively. Conclusion. We developed and compared five models for the detection of lesions in white light endoscopic images. ResNet50 showed the optimal performance, and Mask R-CNN model could be used to locate and classify lesions in images containing lesions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.