This paper introduces an end-to-end feedforward convolutional neural network that is able to reliably classify the source and type of animal calls in a noisy environment using two streams of audio data after being trained on a dataset of modest size and imperfect labels. The data consists of audio recordings from captive marmoset monkeys housed in pairs, with several other cages nearby. The network in this paper can classify both the call type and which animal made it with a single pass through a single network using raw spectrogram images as input. The network vastly increases data analysis capacity for researchers interested in studying marmoset vocalizations, and allows data collection in the home cage, in group housed animals. V
Deep neural networks, including reinforcement learning agents, have been proven vulnerable to small adversarial changes in the input, thus making deploying such networks in the real world problematic. In this paper, we propose RADIAL-RL, a method to train reinforcement learning agents with improved robustness against any l p -bounded adversarial attack. By simply minimizing an upper bound of the loss functions under worst case adversarial perturbation derived from efficient robustness verification methods, we significantly improve robustness of RL-agents trained on Atari-2600 games and show that RADIAL-RL can beat state-of-the-art robust training algorithms when evaluated against PGD-attacks. We also propose a new evaluation method, Greedy Worst-Case Reward (GWC), for measuring attack agnostic robustness of RL agents. GWC can be evaluated efficiently and it serves as a good estimate of the reward under the worst possible sequence of adversarial attacks; in particular, GWC accounts for the importance of each action and their temporal dependency, improving upon previous approaches that only evaluate whether each single action can change under input perturbations. Our code is available at https://github.com/tuomaso/radial_rl.Preprint. Under review.
We introduce an end-to-end feedforward convolutional neural network that is able to reliably classify the source and type of animal calls in a noisy environment using two streams of audio data after being trained on a dataset of modest size and imperfect labels. The data consists of audio recordings from captive marmoset monkeys housed in pairs, with several other cages nearby. Our network can classify both the call type and which animal made it with a single pass through a single network using raw spectrogram images as input. The network vastly increases data analysis capacity for researchers interested in studying marmoset vocalizations, and allows data collection in the home cage, in group housed animals.
In this paper, the authors aim to combine the latest state of the art models in image recognition with the best publicly available satellite images to create a system for landslide risk mitigation. We focus first on landslide detection and further propose a similar system to be used for prediction. Such models are valuable as they could easily be scaled up to provide data for hazard evaluation, as satellite imagery becomes increasingly available. The goal is to use satellite images and correlated data to enrich the public repository of data and guide disaster relief efforts for locating precise areas where landslides have occurred. Different image augmentation methods are used to increase diversity in the chosen dataset and create more robust classification. The resulting outputs are then fed into variants of 3-D convolutional neural networks. A review of the current literature indicates there is no research using CNNs (Convolutional Neural Networks) and freely available satellite imagery for classifying landslide risk. The model has shown to be ultimately able to achieve a significantly better than baseline accuracy.
The recent introduction of Graph Neural Networks (GNNs) and their growing popularity in the past few years has enabled the application of deep learning algorithms to non-Euclidean, graph-structured data. GNNs have achieved stateof-the-art results across an impressive array of graph-based machine learning problems. Nevertheless, despite their rapid pace of development, much of the work on GNNs has focused on graph classification and embedding techniques, largely ignoring regression tasks over graph data. In this paper, we develop a Graph Mixture Density Network (GraphMDN), which combines graph neural networks with mixture density network (MDN) outputs. By combining these techniques, GraphMDNs have the advantage of naturally being able to incorporate graph structured information into a neural architecture, as well as the ability to model multi-modal regression targets. As such, GraphMDNs are designed to excel on regression tasks wherein the data are graph structured, and target statistics are better represented by mixtures of densities rather than singular values (so-called "inverse problems"). To demonstrate this, we extend an existing GNN architecture known as Semantic GCN (SemGCN) to a GraphMDN structure, and show results from the Human3.6M pose estimation task. The extended model consistently outperforms both GCN and MDN architectures on their own, with a comparable number of parameters.
Humans are remarkably efficient at decision-making, even in "open-ended'' problems where the set of possible actions is too large for exhaustive evaluation. Our success relies, in part, on efficient processes of calling to mind and considering the right candidate actions for evaluation. When this process fails, however, the result is a kind of cognitive puzzle in which the value of a solution or action would be obvious as soon as it is considered, but never gets considered in the first place. Recently, machine learning (ML) architectures have attained or even exceeded human performance on certain kinds of open-ended tasks such as the games of chess and go. We ask whether the broad architectural principles that underlie ML success in these domains tend to generate similar consideration failures to those observed in humans. We demonstrate a case in which they do, illuminating how humans make open-ended decisions, how this relates to ML approaches to similar problems, and how both architectures lead to characteristic patterns of success and failure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.