Currently successful methods for video description are based on encoder-decoder sentence generation using recurrent neural networks (RNNs). Recent work has shown the advantage of integrating temporal and/or spatial attention mechanisms into these models, in which the decoder network predicts each word in the description by selectively giving more weight to encoded features from specific time frames (temporal attention) or to features from specific spatial regions (spatial attention). In this paper, we propose to expand the attention model to selectively attend not just to specific times or spatial regions, but to specific modalities of input such as image features, motion features, and audio features. Our new modality-dependent attention mechanism, which we call multimodal attention, provides a natural way to fuse multimodal information for video description. We evaluate our method on the Youtube2Text dataset, achieving results that are competitive with current state of the art. More importantly, we demonstrate that our model incorporating multimodal attention as well as temporal attention significantly outperforms the model that uses temporal attention alone.
Automatic detection and classification of defects in infrastructure surface images can largely boost its maintenance efficiency. Given enough labeled images, various supervised learning methods have been investigated for this task, including decision trees and support vector machines in previous studies, and deep neural networks more recently. However, in real world applications, labels are harder to obtain than images, due to the limited labeling resources (i.e., experts). Thus we propose a deep active learning system to maximize the performance. A deep residual network is firstly designed for defect detection and classification in an image. Following our active learning strategy, this network is trained as soon as an initial batch of labeled images becomes available. It is then used to select a most informative subset of new images and query labels from experts to retrain the network. Experiments demonstrate more efficient performance improvements of our method than baselines, achieving 87.5% detection accuracy. International Workshop on Computing in Civil Engineering (IWCCE) This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.
Active learning-a class of algorithms that iteratively searches for the most informative samples to include in a training dataset-has been shown to be effective at annotating data for image classification. However, the use of active learning for object detection is still largely unexplored as determining informativeness of an object-location hypothesis is more difficult. In this paper, we address this issue and present two metrics for measuring the informativeness of an object hypothesis, which allow us to leverage active learning to reduce the amount of annotated data needed to achieve a target object detection performance. Our first metric measures "localization tightness" of an object hypothesis, which is based on the overlapping ratio between the region proposal and the final prediction. Our second metric measures "localization stability" of an object hypothesis, which is based on the variation of predicted object locations when input images are corrupted by noise. Our experimental results show that by augmenting a conventional active-learning algorithm designed for classification with the proposed metrics, the amount of labeled training data required can be reduced up to 25%. Moreover, on PASCAL 2007 and 2012 datasets our localization-stability method has an average relative improvement of 96.5% and 81.9% over the baseline method using classification only.
Particle tracing for streamline and pathline generation is a common method of visualizing vector fields in scientific data, but it is difficult to parallelize efficiently because of demanding and widely varying computational and communication loads. In this paper we scale parallel particle tracing for visualizing steady and unsteady flow fields well beyond previously published results. We configure the 4D domain decomposition into spatial and temporal blocks that combine in-core and out-of-core execution in a flexible way that favors faster run time or smaller memory. We also compare static and dynamic partitioning approaches. Strong and weak scaling curves are presented for tests conducted on an IBM Blue Gene/P machine at up to 32 K processes using a parallel flow visualization library that we are developing. Datasets are derived from computational fluid dynamics simulations of thermal hydraulics, liquid mixing, and combustion.
Because of the ever increasing size of output data from scientific simulations, supercomputers are increasingly relied upon to generate visualizations. One use of supercomputers is to generate field lines from large scale flow fields. When generating field lines in parallel, the vector field is generally decomposed into blocks, which are then assigned to processors. Since various regions of the vector field can have different flow complexity, processors will require varying amounts of computation time to trace their particles, causing load imbalance, and thus limiting the performance speedup. To achieve load-balanced streamline generation, we propose a workload-aware partitioning algorithm to decompose the vector field into partitions with near equal workloads. Since actual workloads are unknown beforehand, we propose a workload estimation algorithm to predict the workload in the local vector field. A graph-based representation of the vector field is employed to generate these estimates. Once the workloads have been estimated, our partitioning algorithm is hierarchically applied to distribute the workload to all partitions. We examine the performance of our workload estimation and workload-aware partitioning algorithm in several timings studies, which demonstrates that by employing these methods, better scalability can be achieved with little overhead.
We present a set of building blocks that provide scalable data movement capability to computational scientists and visualization researchers for writing their own parallel analysis. The set includes scalable tools for domain decomposition, process assignment, parallel I/O, global reduction, and local neighborhood communication-tasks that are common across many analysis applications. The global reduction is performed with a new algorithm, described in this paper, that efficiently merges blocks of analysis results into a smaller number of larger blocks. The merging is configurable in the number of blocks that are reduced in each round, the number of rounds, and the total number of resulting blocks. We highlight the use of our library in two analysis applications: parallel streamline generation and parallel Morse-Smale topological analysis. The first case uses an existing local neighborhood communication algorithm, whereas the latter uses the new merge algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.