Recognition of defects in concrete infrastructure, especially in bridges, is a costly and time consuming crucial first step in the assessment of the structural integrity. Large variation in appearance of the concrete material, changing illumination and weather conditions, a variety of possible surface markings as well as the possibility for different types of defects to overlap, make it a challenging real-world task. In this work we introduce the novel COncrete DEfect BRidge IMage dataset (CODEBRIM) for multi-target classification of five commonly appearing concrete defects. We investigate and compare two reinforcement learning based metalearning approaches, MetaQNN and efficient neural architecture search, to find suitable convolutional neural network architectures for this challenging multi-class multi-target task. We show that learned architectures have fewer overall parameters in addition to yielding better multi-target accuracy in comparison to popular neural architectures from the literature evaluated in the context of our application.
We present an analysis of predictive uncertainty based out-of-distribution detection for different approaches to estimate various models' epistemic uncertainty and contrast it with extreme value theory based open set recognition. While the former alone does not seem to be enough to overcome this challenge, we demonstrate that uncertainty goes hand in hand with the latter method. This seems to be particularly reflected in a generative model approach, where we show that posterior based open set recognition outperforms discriminative models and predictive uncertainty based outlier rejection, raising the question of whether classifiers need to be generative in order to know what they have not seen.
We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment. The agent hears multiple audio sources simultaneously (e.g., a person speaking down the hall in a noisy household) and must use its eyes and ears to automatically separate out the sounds originating from the target object within a limited time budget. Towards this goal, we introduce a reinforcement learning approach that trains movement policies controlling the agent's camera and microphone placement over time, guided by the improvement in predicted audio separation quality. We demonstrate our approach in scenarios motivated by both augmented reality (system is already co-located with the target object) and mobile robotics (agent begins arbitrarily far from the target object). Using state-of-the-art realistic audio-visual simulations in 3D environments, we demonstrate our model's ability to find minimal movement sequences with maximal payoff for audio source separation. Project: http://vision. cs.utexas.edu/projects/move2hear.
Summary The paper presents modeling and simulation of ion‐sensitive field‐effect transistor (ISFET)‐based pH sensor with temperature‐dependent behavioral macromodel and proposes to compensate the temperature drift in the sensor using intelligent machine learning (ML) models. The macromodel is built using SPICE by introducing electrochemical parameters in a metal‐oxide‐semiconductor field‐effect transistor (MOSFET) model to simulate ISFET characteristics. We account for the temperature dependence of electrochemical and semiconductor parameters in our macromodel to increase its robustness. The macromodel is then exported as a subcircuit element, which is used to design the readout interface circuit. A simple constant‐voltage, constant‐current (CVCC) topology is utilized to generate the data for temperature drift in ISFET pH sensor, which is used to train and test state‐of‐the‐art ML‐based regression models in order to compensate the drift behavior. The experimental results demonstrate that the random forest (RF) technique achieves the best performance with very high correlation and low error rate. Corresponding curves for output signal using the trained models show highly temperature‐independent characteristics when tested for pH 2, 4, 7, 10, and 12, and we obtained a root mean squared error (RMS) variation of ΔpH ≤ 0.024 over a temperature range of 15°C to 55°C in comparison with ΔpH ≤ 1.346 for uncompensated output signal. This work establishes the framework for integration of ML techniques for drift compensation of ISFET chemical sensor to improve its performance.
We propose a self-supervised method for learning representations based on spatial audio-visual correspondences in egocentric videos. In particular, our method leverages a masked auto-encoding framework to synthesize masked binaural audio through the synergy of audio and vision, thereby learning useful spatial relationships between the two modalities. We use our pretrained features to tackle two downstream video tasks requiring spatial understanding in social scenarios: active speaker detection and spatial audio denoising. We show through extensive experiments that our features are generic enough to improve over multiple state-of-the-art baselines on two public challenging egocentric video datasets, EgoCom and Easy-Com. Project: http://vision.cs.utexas.edu/projects/ego_av_corr.Preprint. Under review.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.