The complex decision-making systems used for autonomous vehicles or advanced driver-assistance systems (ADAS) are being replaced by end-to-end (e2e) architectures based on deep-neural-networks (DNN). DNNs can learn complex driving actions from datasets containing thousands of images and data obtained from the vehicle perception system. This work presents the classification, design and implementation of six e2e architectures capable of generating the driving actions of speed and steering wheel angle directly on the vehicle control elements. The work details the design stages and optimization process of the convolutional networks to develop six e2e architectures. In the metric analysis the architectures have been tested with different data sources from the vehicle, such as images, XYZ accelerations and XYZ angular speeds. The best results were obtained with a mixed data e2e architecture that used front images from the vehicle and angular speeds to predict the speed and steering wheel angle with a mean error of 1.06%. An exhaustive optimization process of the convolutional blocks has demonstrated that it is possible to design lightweight e2e architectures with high performance more suitable for the final implementation in autonomous driving.
Current predefined architectures for deep learning are computationally very heavy and use tens of millions of parameters. Thus, computational costs may be prohibitive for many experimental or technological setups. We developed an ad hoc architecture for the classification of multispectral images using deep learning techniques. The architecture, called 3DeepM, is composed of 3D filter banks especially designed for the extraction of spatial-spectral features in multichannel images. The new architecture has been tested on a sample of 12210 multispectral images of seedless table grape varieties: Autumn Royal, Crimson Seedless, Itum4, Itum5 and Itum9. 3DeepM was able to classify 100% of the images and obtained the best overall results in terms of accuracy, number of classes, number of parameters and training time compared to similar work. In addition, this paper presents a flexible and reconfigurable computer vision system designed for the acquisition of multispectral images in the range of 400 nm to 1000 nm. The vision system enabled the creation of the first dataset consisting of 12210 37-channel multispectral images (12 VIS + 25 IR) of five seedless table grape varieties that have been used to validate the 3DeepM architecture. Compared to predefined classification architectures such as AlexNet, ResNet or ad hoc architectures with a very high number of parameters, 3DeepM shows the best classification performance despite using 130-fold fewer parameters than the architecture to which it was compared. 3DeepM can be used in a multitude of applications that use multispectral images, such as remote sensing or medical diagnosis. In addition, the small number of parameters of 3DeepM make it ideal for application in online classification systems aboard autonomous robots or unmanned vehicles.
Background The combination of computer vision devices such as multispectral cameras coupled with artificial intelligence has provided a major leap forward in image-based analysis of biological processes. Supervised artificial intelligence algorithms require large ground truth image datasets for model training, which allows to validate or refute research hypotheses and to carry out comparisons between models. However, public datasets of images are scarce and ground truth images are surprisingly few considering the numbers required for training algorithms. Results We created a dataset of 1,283 multidimensional arrays, using berries from five different grape varieties. Each array has 37 images of wavelengths between 488.38 and 952.76 nm obtained from single berries. Coupled to each multispectral image, we added a dataset with measurements including, weight, anthocyanin content, and Brix index for each independent grape. Thus, the images have paired measures, creating a ground truth dataset. We tested the dataset with 2 neural network algorithms: multilayer perceptron (MLP) and 3-dimensional convolutional neural network (3D-CNN). A perfect (100% accuracy) classification model was fit with either the MLP or 3D-CNN algorithms. Conclusions This is the first public dataset of grape ground truth multispectral images. Associated with each multispectral image, there are measures of the weight, anthocyanins, and Brix index. The dataset should be useful to develop deep learning algorithms for classification, dimensionality reduction, regression, and prediction analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.