2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461524
|View full text |Cite
|
Sign up to set email alerts
|

Content-Based Representations of Audio Using Siamese Neural Networks

Abstract: In this paper, we focus on the problem of content-based retrieval for audio, which aims to retrieve all semantically similar audio recordings for a given audio clip query. This problem is similar to the problem of query by example of audio, which aims to retrieve media samples from a database, which are similar to the user-provided example. We propose a novel approach which encodes the audio into a vector representation using Siamese Neural Networks. The goal is to obtain an encoding similar for files belongin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
20
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 31 publications
(22 citation statements)
references
References 19 publications
0
20
0
Order By: Relevance
“…In the inference step, by using the feature space, the input is classified to one of the target classes. Regarding deep metric learning in acoustic signal processing [17][18][19][20][21][22][23][24][25][26], we summarize an overview of tasks, loss functions, and sampling strategies, in Table 1. Manocha et al have worked on sound clip search task and used contrastive loss, where a feature space is learned based on a pair type that consists of the same class or different classes and a feature space distance [19].…”
Section: Related Workmentioning
confidence: 99%
“…In the inference step, by using the feature space, the input is classified to one of the target classes. Regarding deep metric learning in acoustic signal processing [17][18][19][20][21][22][23][24][25][26], we summarize an overview of tasks, loss functions, and sampling strategies, in Table 1. Manocha et al have worked on sound clip search task and used contrastive loss, where a feature space is learned based on a pair type that consists of the same class or different classes and a feature space distance [19].…”
Section: Related Workmentioning
confidence: 99%
“…The superiority of combining the deep learning approach with fingerprinting is demonstrated in [37], where a Siamese Neural Network (SNN) produced semantic representations of audio signals. SNNs have been applied to sound classification in [37], [38], and [39] and have the advantage over the canonical CNN in their ability to generalize.…”
Section: Introductionmentioning
confidence: 99%
“…Previous studies mainly focus on sound event detection (SED), investigating which sound events happen in an audio recording and when they occur [2]. In contrast, Sound event retrieval (SER) is retrieving audio recordings that are similar to a given input audio query [3,4]. This similarity can be based on acoustic and/or semantic (symbolic) characterization [5].…”
Section: Introductionmentioning
confidence: 99%
“…Previous audio retrieval research mainly focuses on either acoustic similarity or categorization [4][5][6][7][8]. We neither simply use SED techniques to classify sound and retrieve the label, nor simply adopt audio fingerprinting to measure similarity.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation