Abstract:In this paper, we propose a broad comparison between Fully Convolutional Networks (FCNs) and Mask Regionbased Convolutional Neural Networks (Mask-RCNNs) applied in the Salient Object Detection (SOD) context. Studies in the SOD literature usually explore architectures based in FCNs to detect salient regions and objects in visual scenes. However, besides the promising results achieved, FCNs showed issues in some challenging scenarios. Fairly recently studies in the SOD literature proposed the use of a Mask-RCNN … Show more
“…Additionally, our proposed framework ease the addition of other modules such as image processing, classification, object detection, semantic segmentation, and some others novel deep learning methods that explore domain adaptation and data generation that can run on the remote server and make use of Hardware-accelerated Deep Neural Networks running on GPU [30], [31], [32], [33].…”
Civilian Unmanned Aerial Vehicles (UAVs) are becoming more accessible for domestic use. Currently, UAV manufacturer DJI dominates the market, and their drones have been used for a wide range of applications. Model lines such as the Phantom can be applied for autonomous navigation where Global Positioning System (GPS) signal are not reliable, with the aid of Simultaneous Localization and Mapping (SLAM), such as monocular Visual SLAM. In this work, we propose a bridge among different systems, such as Linux, Robot Operating System (ROS), Android, and UAVs as an open source framework, where the gimbal camera recording can be streamed to a remote server, supporting the implementation of an autopilot. Finally, we present some experimental results showing the performance of the video streaming validating the framework.
“…Additionally, our proposed framework ease the addition of other modules such as image processing, classification, object detection, semantic segmentation, and some others novel deep learning methods that explore domain adaptation and data generation that can run on the remote server and make use of Hardware-accelerated Deep Neural Networks running on GPU [30], [31], [32], [33].…”
Civilian Unmanned Aerial Vehicles (UAVs) are becoming more accessible for domestic use. Currently, UAV manufacturer DJI dominates the market, and their drones have been used for a wide range of applications. Model lines such as the Phantom can be applied for autonomous navigation where Global Positioning System (GPS) signal are not reliable, with the aid of Simultaneous Localization and Mapping (SLAM), such as monocular Visual SLAM. In this work, we propose a bridge among different systems, such as Linux, Robot Operating System (ROS), Android, and UAVs as an open source framework, where the gimbal camera recording can be streamed to a remote server, supporting the implementation of an autopilot. Finally, we present some experimental results showing the performance of the video streaming validating the framework.
“…It is a pixel-level classification technique with three major tasks: classification, localization, and segmentation. Krinski et al [9] conducted research that with clear images, Mask region-based CNN (R-CNN) outperforms fully convolutional network (FCN). Valada et al [5] demonstrated that ParseNet and AdapNet show high accuracy in detecting objects in images with severe driving conditions.…”
Section: Related Workmentioning
confidence: 99%
“…Valada et al [5] demonstrated that ParseNet and AdapNet show high accuracy in detecting objects in images with severe driving conditions. Many studies with segmentation have improved object detection performance, but the accuracy still stays around 80% [5][6][7][8][9][10]. The accuracy of most segmentation algorithms is higher than Yolo algorithms', but the efficiency is much worse [11][12][13].…”
The field of autonomous driving leaves minimal margins for error. Ensuring that self-driving vehicles possess the ability to accurately perceive their surroundings, even amidst conditions of limited visibility, is of utmost importance. We propose a novel approach to enhance the precision of object detection on the road during limited visibility driving or roadway conditions. The initial step involves the classification of the driving condition of an input image, and then, the corresponding semantic segmentation model will process the image to distinguish objects. Our dataset consists of roadway images depicting 20 distinct objects amidst adverse limited visibility conditions. The experimental results validate our approach, with the proposed method displaying quality accuracy levels for training, validation, and testing data. Our classification model achieved 100% accuracy. Particularly, the proposed methods achieved final mean IoU scores of 57.3%, 32.0%, 49.4%, and 47.8%, respectively, for FOG, NIGHT, RAIN, and SNOW conditions when using the U-NET model for segmentation. These mean IoU results are better than the traditional nonhierarchical training methods, which utilize the same U-NET structure.
“…In recent decades, the SOD literature presented an impressive growth in the number of novel and promising approaches. Recent works, which are based on Deep Learning techniques, have shown remarkable results in the field [2], [3]. Due to its high precision and generalization abilities, Deep Learningbased methods can find the salient regions of images with higher reliability.…”
In this paper, we propose a novel data augmentation technique (ANDA) applied to the Salient Object Detection (SOD) context. Standard data augmentation techniques proposed in the literature, such as image cropping, rotation, flipping, and resizing, only generate variations of the existing examples, providing a limited generalization. Our method has the novelty of creating new images, by combining an object with a new background while retaining part of its salience in this new context; To do so, the ANDA technique relies on the linear combination between labeled salient objects and new backgrounds, generated by removing the original salient object in a process known as image inpainting. Our proposed technique allows for more precise control of the object's position and size while preserving background information. Aiming to evaluate our proposed method, we trained multiple deep neural networks and compared the effect that our technique has in each one. We also compared our method with other data augmentation techniques. Our findings show that depending on the network improvement can be up to 14.1% in the F-measure and decay of up to 2.6% in the Mean Absolute Error.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.