This paper proposes a computationally efficient approach to detecting objects natively in 3D point clouds using convolutional neural networks (CNNs). In particular, this is achieved by leveraging a feature-centric voting scheme to implement novel convolutional layers which explicitly exploit the sparsity encountered in the input. To this end, we examine the trade-off between accuracy and speed for different architectures and additionally propose to use an L1 penalty on the filter activations to further encourage sparsity in the intermediate representations. To the best of our knowledge, this is the first work to propose sparse convolutional layers and L1 regularisation for efficient large-scale processing of 3D data. We demonstrate the efficacy of our approach on the KITTI object detection benchmark and show that Vote3Deep models with as few as three layers outperform the previous state of the art in both laser and laser-vision based approaches by margins of up to 40% while remaining highly competitive in terms of processing time.
Artificial intelligence research has seen enormous progress over the past few decades, but it predominantly relies on fixed datasets and stationary environments. Continual learning is an increasingly relevant area of study that asks how artificial systems might learn sequentially, as biological systems do, from a continuous stream of correlated data. In the present review, we relate continual learning to the learning dynamics of neural networks, highlighting the potential it has to considerably improve data efficiency. We further consider the many new biologically inspired approaches that have emerged in recent years, focusing on those that utilize regularization, modularity, memory, and meta-learning, and highlight some of the most promising and impactful directions. The World Is Not Stationary A common benchmark for success in artificial intelligence is the ability to emulate human learning. We measure the abilities of humans to recognize images, play games, and drive a car, to name a few, and then develop machine learning models that can match or exceed these given enough training data. This paradigm puts the emphasis on the end result, rather than the learning process, and overlooks a critical characteristic of human learning: that it is robust to changing tasks and sequential experience. It is perhaps unsurprising that humans can learn this way, after all, time is irreversible and the world is non-stationary (see Glossary), so human learning has evolved to thrive in dynamic learning settings. However, this robustness is in stark contrast to the most powerful modern machine learning methods, which perform well only when presented with data that are carefully shuffled, balanced, and homogenized. Not only do these models underperform when presented with changing or incremental data regimes, in some cases they fail completely or suffer from rapid performance degradation on earlier learned tasks, known as catastrophic forgetting. What might be gained by developing neural network models that learn sequentially like humans? First of all, many applications could benefit from continual adaptation to a changing target specification: for example, visual recognition algorithms that need to learn a diverse, growing set of image classes; or household robots that need to incrementally add skills to their repertoire. Continual learning techniques could enable models to acquire specialized solutions without forgetting previous ones, potentially learning over a lifetime, as a human does. In fact, continual learning is generally considered one of the attributes necessary for human-level artificial general intelligence [1]. More fundamentally, continual learning methods could offer enormous advantages for deep neural networks even in stationary settings, by improving learning efficiency as well as by enabling knowledge transfer between related tasks. This article will first motivate a taxonomy of continual learning approaches through describing their connections with biological systems. Just as continual learning in humans cann...
We present an approach for learning spatial traversability maps for driving in complex, urban environments based on an extensive dataset demonstrating the driving behaviour of human experts. The direct end-to-end mapping from raw input data to cost bypasses the effort of manually designing parts of the pipeline, exploits a large number of data samples, and can be framed additionally to refine handcrafted cost maps produced based on manual hand-engineered features. To achieve this, we introduce a maximum-entropy-based, non-linear inverse reinforcement learning (IRL) framework which exploits the capacity of fully convolutional neural networks (FCNs) to represent the cost model underlying driving behaviours. The application of a high-capacity, deep, parametric approach successfully scales to more complex environments and driving behaviours, while at deployment being run-time independent of training dataset size. After benchmarking against state-of-the-art IRL approaches, we focus on demonstrating scalability and performance on an ambitious dataset collected over the course of 1 year including more than 25,000 demonstration trajectories extracted from over 120 km of urban driving. We evaluate the resulting cost representations by showing the advantages over a carefully, manually designed cost map and furthermore demonstrate its robustness towards systematic errors by learning accurate representations even in the presence of calibration perturbations. Importantly, we demonstrate that a manually designed cost map can be refined to more accurately handle corner cases that are scarcely seen in the environment, such as stairs, slopes and underpasses, by further incorporating human priors into the training framework.
This paper presents an end-to-end approach for tracking static and dynamic objects for an autonomous vehicle driving through crowded urban environments. Unlike traditional approaches to tracking, this method is learned end-to-end, and is able to directly predict a full unoccluded occupancy grid map from raw laser input data. Inspired by the recently presented DeepTracking approach ([1], [2]), we employ a recurrent neural network (RNN) to capture the temporal evolution of the state of the environment, and propose to use Spatial Transformer modules to exploit estimates of the egomotion of the vehicle. Our results demonstrate the ability to track a range of objects, including cars, buses, pedestrians, and cyclists through occlusion, from both moving and stationary platforms, using a single learned model. Experimental results demonstrate that the model can also predict the future states of objects from current inputs, with greater accuracy than previous work.
Purpose – This paper aims to understand and identify the various barriers in adopting new telecom services in rural areas for improving the penetration and revenue of the telecom companies. These barriers are modeled to study their inter-relationships and prioritize them for strategizing appropriate management action plans. Design/methodology/approach – Delphi technique has been used to form a consensus with the telecom managers working in rural areas to finalize the barriers. An integrated Interpretive Structural Modeling–Analytic Network Process (ISM–ANP) approach has been adopted to establish the complex relationships, cluster the relationships, to understand and prioritize the telecom service adoption barriers. Findings – The major contribution of this research is imposing directions and dominance of various barriers to promote better adoption of new telecom-based mobile services in rural areas. The proposed integrated method can aid in decision making by providing more informative, accurate and a better choice than using either ISM or ANP in isolation. Research limitations/implications – The generalizabilty of these research findings is limited, as it was generated specific to rural telecom service adoption barriers in Indian context. Because decision-making problems are usually complex and ill-structured, every decision is based on the decision-maker’s expertise, preferences and biasness of the experts who showed their interest to participate in the research. Practical implications – This paper forms the basis of identifying the reasons for poor adoption of telecom-based mobile services in rural India. This study would help the telecom companies and the managers to understand and develop strategies to target the rural audience by introducing action plans and innovative mobile services to overcome the identified barriers. By applying the proposed methodology, telecom companies can classify and prioritize their action plans as short-, medium- and long-term plans to systematically overcome the identified barriers. Originality/value – This paper provides a base for understanding various factors that affect the adoption of telecom-based mobile services. It demonstrates the use of an innovative approach to develop an integrated model to understand the barriers.
Autonomous vehicles are often tasked to explore unseen environments, aiming to acquire and understand large amounts of visual image data and other sensory information. In such scenarios, remote sensing data may be available a priori, and can help to build a semantic model of the environment and plan future autonomous missions. In this paper, we introduce two multimodal learning algorithms to model the relationship between visual images taken by an autonomous underwater vehicle during a survey and remotely sensed acoustic bathymetry (ocean depth) data that is available prior to the survey. We present a multi-layer architecture to capture the joint distribution between the bathymetry and visual modalities. We then propose an extension based on gated feature learning models, which allows the model to cluster the input data in an unsupervised fashion and predict visual image features using just the ocean depth information. Our experiments demonstrate that multimodal learning improves semantic classification accuracy regardless of which modalities are available at classification time, allows for unsupervised clustering of either or both modalities, and can facilitate mission planning by enabling class-based or image-based queries.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.