New theory is presented to calculate the entropy of a liquid of flexible molecules from a molecular dynamics simulation. Entropy is expressed in two terms: a vibrational term, representing the average number of configurations and momentum states in an energy well, and a topographical term, representing the effective number of energy wells. The vibrational term is derived in a hierarchical manner from two force-torque covariance matrices, one at the molecular level and one at the united-atom level. The topographical term comprises conformations and orientations, which are derived from the dihedral distributions and 1 coordination numbers, respectively. The method is tested on fourteen liquids, ranging from argon to cyclohexane. For most molecules our results lie within the experimental range, and are slightly higher than those by the 2PT method, the only other method currently capable of directly calculating entropy for such systems. As well as providing an efficient and practical way to calculate entropy, the theory serves to give a comprehensive characterization and quantification of molecular structure.
In this paper, we introduce a novel attentional similarity module for the problem of few-shot sound recognition. Given a few examples of an unseen sound event, a classifier must be quickly adapted to recognize the new sound event without much fine-tuning. The proposed attentional similarity module can be plugged into any metric-based learning method for few-shot learning, allowing the resulting model to especially match related short sound events. Extensive experiments on two datasets show that the proposed module consistently improves the performance of five different metricbased learning methods for few-shot sound recognition. The relative improvement ranges from +4.1% to +7.7% for 5-shot 5-way accuracy for the ESC-50 dataset, and from +2.1% to +6.5% for noiseESC-50. Qualitative results demonstrate that our method contributes in particular to the recognition of transient sound events.
Making sense of the surrounding context and ongoing
events through not only the visual inputs but
also acoustic cues is critical for various AI applications.
This paper presents an attempt to learn
a neural network model that recognizes more than
500 different sound events from the audio part of
user generated videos (UGV). Aside from the large
number of categories and the diverse recording conditions
found in UGV, the task is challenging because
a sound event may occur only for a short
period of time in a video clip. Our model specifically
tackles this issue by combining a main subnet
that aggregates information from the entire clip
to make clip-level predictions, and a supplementary
subnet that examines each short segment of the clip
for segment-level predictions. As the labeled data
available for model training are typically on the clip
level, the latter subnet learns to pay attention to segments
selectively to facilitate attentional segment-level
supervision. We call our model the M&mnet,
for it leverages both “M”acro (clip-level) supervision
and “m”icro (segment-level) supervision derived
from the macro one. Our experiments show
that M&mnet works remarkably well for recognizing
sound events, establishing a new state-of-theart
for DCASE17 and AudioSet data sets. Qualitative
analysis suggests that our model exhibits strong
gains for short events. In addition, we show that the
micro subnet is computationally light and we can
use multiple micro subnets to better exploit information
in different temporal scales.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.