Abstract-Traffic light recognition (TLR) is an integral part of any intelligent vehicle, which must function in the existing infrastructure. Pedestrian and sign detection have recently seen great improvements due to the introduction of learning based detectors using integral channel features. A similar push have not yet been seen for the detection sub-problem of TLR, where detection is dominated by methods based on heuristic models.Evaluation of existing systems is currently limited primarily to small local datasets. In order to provide a common basis for comparing future TLR research an extensive public database is collected based on footage from US roads. The database consists of both test and training data, totaling 46,418 frames and 112,971 annotated traffic lights, captured in continuous sequences under a varying light and weather conditions.The learning based detector achieves an AUC of 0.4 and 0.32 for day sequence 1 and 2, respectively, which is more than an order of magnitude better than the two heuristic model-based detectors.
Abstract. Traffic light recognition (TLR) is an integral part of any intelligent vehicle, it must function both at day and at night. However, the majority of TLR research is focused on day-time scenarios. In this paper we will focus on detection of traffic lights at night and evaluate the performance of three detectors based on heuristic models and one learning-based detector. Evaluation is done on night-time data from the public LISA Traffic Light Dataset. The learning-based detector outperforms the model-based detectors in both precision and recall. The learning-based detector achieves an average AUC of 51.4 % for the two night test sequences. The heuristic model-based detectors achieves AUCs ranging from 13.5 % to 15.0 %.
Automating inspection of critical infrastructure such as sewer systems will help utilities optimize maintenance and replacement schedules. The current inspection process consists of manual reviews of video as an operator controls a sewer inspection vehicle remotely. The process is slow, labor-intensive, and expensive and presents a huge potential for automation. With this work, we address a central component of the next generation of robotic inspection of sewers, namely the choice of 3D sensing technology. We investigate three prominent techniques for 3D vision: passive stereo, active stereo, and time-of-flight (ToF). The Realsense D435 camera is chosen as the representative of the first two techniques wheres the PMD CamBoard pico flexx represents ToF. The 3D reconstruction performance of the sensors is assessed in both a laboratory setup and in an outdoor above-ground setup. The acquired point clouds from the sensors are compared with reference 3D models using the cloud-to-mesh metric. The reconstruction performance of the sensors is tested with respect to different illuminance levels and different levels of water in the pipes. The results of the tests show that the ToF-based point cloud from the pico flexx is superior to the output of the active and passive stereo cameras.
Abstract. This paper presents an approach for automatic visual inspection of chicken entrails in RGB-D data. The point cloud is first over-segmented into supervoxels based on color, spatial and geometric information. Color, position and texture features are extracted from each of the resulting supervoxels and passed to a Random Forest classifier, which classifies the supervoxels as either belonging to heart, lung, liver or misc. The dataset consists of 150 individual entrails, with 30 of these being reserved for evaluation. Segmentation performance is evaluated on a voxel-by-voxel basis, achieving an average Jaccard index of 61.5% across the four classes of organs. This is a 5.9% increase over the 58.1% achieved with features derived purely from 2D.
We present a pattern recognition framework for semantic segmentation of visual structures, that is, multi-class labelling at pixel level, and apply it to the task of segmenting organs in the eviscerated viscera from slaughtered poultry in RGB-D images. This is a step towards replacing the current strenuous manual inspection at poultry processing plants. Features are extracted from feature maps such as activation maps from a convolutional neural network (CNN). A random forest classifier assigns class probabilities, which are further refined by utilizing context in a conditional random field. The presented method is compatible with both 2D and 3D features, which allows us to explore the value of adding 3D and CNN-derived features. The dataset consists of 604 RGB-D images showing 151 unique sets of eviscerated viscera from four different perspectives. A mean Jaccard index of 78.11% is achieved across the four classes of organs by using features derived from 2D, 3D and a CNN, compared to 74.28% using only basic 2D image features.
Abstract. The focus of this paper is to count the number of people participating in a specific carnival, namely Aalborg Carnival in Denmark, which is believed to be the biggest in Northern Europe. A carnival poses significant challenges from a computer vision viewpoint due to high density, occlusion and non-human objects in the scene. To this end we apply a passive stereo vision approach to create a depth image where the heads of people are segmented, tracked, and counted in real-time. The results from the parade demonstrated that the system is able to count the people passing by with an uncertainty of 5.8 %.
Research in traffic light recognition (TLR) has stagnated compared to related computer vision areas, such as pedestrian detection and and traffic sign recognition. We focus on the detection sub-problem, since this is the most challenging problem and solving this is the key to a successful TLR system. This is done by looking at four detectors from different author groups and their reported results. From surveying existing work it is clear that currently evaluation is limited primarily to small local datasets. In order to provide a common basis for future comparison of TLR research an extensive public database is collected based on footage from US roads. The database consists of continuous test and training video sequences, totaling 46,418 frames and 112,971 annotated traffic lights. The sequences are captured by a stereo camera mounted on the roof of a vehicle driving under both night and day conditions with varying light and weather.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.