Search citation statements

Paper Sections

Select...

2

1

Citation Types

0

3

0

Year Published

2018

2019

Publication Types

Select...

2

Relationship

1

1

Authors

Journals

(3 citation statements)

0

3

0

“…Since not all annotators annotated every audio clip and they may not be equally reliable, truth discovery analysis [34] of this crowdsourced data is required to determine the estimated consensus. Instead of taking the average VA value for each audio clip from the crowdsourced annotated data, the EM algorithm proposed in [35] is used to consider the reliability of each annotator and compute the estimated consensus ground truth y ∈ R 2 . This estimated consensus algorithm also tackles the long-tail phenomenon commonly observed in crowdsourced data by using the upper bound of the confidence interval of χ 2 distribution.…”

confidence: 99%

“…Since not all annotators annotated every audio clip and they may not be equally reliable, truth discovery analysis [34] of this crowdsourced data is required to determine the estimated consensus. Instead of taking the average VA value for each audio clip from the crowdsourced annotated data, the EM algorithm proposed in [35] is used to consider the reliability of each annotator and compute the estimated consensus ground truth y ∈ R 2 . This estimated consensus algorithm also tackles the long-tail phenomenon commonly observed in crowdsourced data by using the upper bound of the confidence interval of χ 2 distribution.…”

confidence: 99%

“…For each audio clip, standard acoustic features are extracted using MIRToolbox [36] across four categories (dynamics, spectral, timbral and tonal) resulting in a 70-dimensional feature vector per frame of 50 ms duration with 50% overlap. For an effective prototypical feature representation, the variational Bayesian inference algorithm is used to compute the Bayesian Acoustic Gaussian Mixture Model (BAGMM) posterior probability feature vector x ∈ R K opt for each audio clip, where K opt = 117 [35]. The advantage of Bayesian inference is that the number of latent audio topics K can be determined from the data automatically, thus avoiding the problems of singularity and over/underfitting with ad-hoc values of K [9].…”

confidence: 99%

“…As for speech, we found publications related to the prediction of speaker likability (Hantke et al, ) and the evaluation of dysarthric speech (Tu et al, ). In the case of music, we found applications related to chord estimation (Ni et al, ) and music mood prediction (Chapaneri & Jayaswal, ). Lastly, we found applications dealing with emotion recognition from utterances (Hantke et al, ) and acoustic classification of animal species (Zhang et al, ).…”

confidence: 99%