Increasingly, the task of detecting and recognising the actions of a human has been delegated to some form of neural network processing camera or wearable sensor data. Due to the degree to which the camera can be affected by lighting and wearable sensors scantiness, neither one modality can capture the required data to perform the task confidently. That being the case, range sensors like Light Detection and Ranging (LiDAR) can complement the process to perceive the environment more robustly. Most recently, researchers have been exploring ways to apply Convolutional Neural Networks to 3-Dimensional data. These methods typically rely on a single modality and cannot draw on information from complementing sensor streams to improve accuracy. This paper proposes a framework to tackle human activity recognition by leveraging the benefits of sensor fusion and multimodal machine learning. Given both RGB and point cloud data, our method describes the activities being performed by subjects using regions with a convolutional neural network (R-CNN) and a 3-Dimensional modified Fisher vector network. Evaluated on a custom captured multimodal dataset demonstrates that the model outputs remarkably accurate human activity classification (90%). Furthermore, this framework can be used for sports analytics, understanding social behaviour, surveillance, and perhaps most notably by Autonomous Vehicles (AV) to datadriven decision-making policies in urban areas and indoor environments.
Increasingly complex automated driving functions, specifically those associated with Free Space Detection (FSD), are delegated to Convolutional Neural Networks (CNN). If the dataset used to train the network lacks diversity, modality or sufficient quantities, the driver policy that controls the vehicle may induce safety risks. Although most autonomous ground vehicles (AGV) perform well in structured surroundings, the need for human intervention significantly rises when presented with unstructured niche environments. To this end, we developed an AGV for seamless indoor and outdoor navigation to collect realistic multimodal data streams. We demonstrate one application of the AGV when applied to a self-evolving FSD framework that leverages online active machine learning (ML) paradigms and sensor data fusion. In essence, the self-evolving AGV queries image data against a reliable data stream, ultrasound, before fusing the sensor data to improve robustness. We compare the proposed framework to one of the most prominent free space segmentation methods, DeepLabV3+ [1]. DeepLabV3+ [1] is a state-of-the-art semantic segmentation model composed of a CNN and an auto-decoder. In consonance with the results, the proposed framework out preforms DeepLabV3+ [1]. The performance of the proposed framework is attributed to its ability to self-learn free space. This combination of online and active ML removes the need for large datasets typically required by a CNN. Moreover, this technique provides case-specific free space classifications based on information gathered from the scenario at hand.
The way we drive, and the transport of today are going through radical changes. Intelligent mobility envisions to improve the efficiency of traditional transportation through advanced digital technologies, such as robotics, artificial intelligence and Internet of Things. Central to the development of intelligent mobility technology is the emergence of connected autonomous vehicles (CAVs) where vehicles are capable of navigating environments autonomously. For this to be achieved, autonomous vehicles must be safe, trusted by passengers, and other drivers. However, it is practically impossible to train autonomous vehicles with all the possible traffic conditions that they may encounter. The work in this paper presents an alternative solution of using infrastructure to aid CAVs to learn driving policies, specifically for complex junctions, which require local experience and knowledge to handle. The proposal is to learn safe driving policies through data-driven imitation learning of human-driven vehicles at a junction utilizing data captured from surveillance devices about vehicle movements at the junction. The proposed framework is demonstrated by processing video datasets captured from uncrewed aerial vehicles (UAVs) from three intersections around Europe which contain vehicle trajectories. An imitation learning algorithm based on long short-term memory (LSTM) neural network is proposed to learn and predict safe trajectories of vehicles. The proposed framework can be used for many purposes in intelligent mobility, such as augmenting the intelligent control algorithms in driverless vehicles, benchmarking driver behavior for insurance purposes, and for providing insights to city planning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.