Neural-network (NN)-based methods show high performance in sound event localization and detection (SELD). Conventional NNbased methods use two branches for a sound event detection (SED) target and a direction-of-arrival (DOA) target. The two-branch representation with a single network has to decide how to balance the two objectives during optimization. Using two networks dedicated to each task increases system complexity and network size. To address these problems, we propose an activity-coupled Cartesian DOA (ACCDOA) representation, which assigns a sound event activity to the length of a corresponding Cartesian DOA vector. The ACCDOA representation enables us to solve a SELD task with a single target and has two advantages: avoiding the necessity of balancing the objectives and model size increase. In experimental evaluations with the DCASE 2020 Task 3 dataset, the ACCDOA representation outperformed the two-branch representation in SELD metrics with a smaller network size. The ACCDOA-based SELD system also performed better than state-of-the-art SELD systems in terms of localization and location-dependent detection.
: A method for measuring motorcycle trajectory using GPS is needed for simulating motorcycle dynamics. In GPS measurements of a motorcycle, both the declination of the motorcycle and obstacles near the course can cause problems. Therefore, we propose a new algorithm for GPS measurement of motorcycle trajectory. We interpolate the missing observation data within a few seconds using polynomial curves, and use a Kalman filter to smoothen position calculations. This results in obtaining trajectory with high accuracy and with sufficient continuity. The precision is equal to that of fixed point positioning, given a sufficient number of available satellites.
Few-shot learning systems for sound event recognition gain interests since they require only a few examples to adapt to new target classes without fine-tuning. However, such systems have only been applied to chunks of sounds for classification or verification. In this paper, we aim to achieve few-shot detection of rare sound events, from long query sequence that contain not only the target events but also the other events and background noise. Therefore, it is required to prevent false positive reactions to both the other events and background noise. We propose metric learning with background noise class for the few-shot detection. The contribution is to present the explicit inclusion of background noise as a independent class, a suitable loss function that emphasizes this additional class, and a corresponding sampling strategy that assists training. It provides a feature space where the event classes and the background noise class are sufficiently separated. Evaluations on few-shot detection tasks, using DCASE 2017 task2 and ESC-50, show that our proposed method outperforms metric learning without considering the background noise class. The few-shot detection performance is also comparable to that of the DCASE 2017 task2 baseline system, which requires huge amount of annotated audio data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.