2005 IEEE Workshop on Machine Learning for Signal Processing
DOI: 10.1109/mlsp.2005.1532913
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Song-Type Classification and Speaker Identification of Norwegian Ortolan Bunting (Emberiza Hortulana) Vocalizations

Abstract: This paper presents an approach to song-type classification and speaker identification of Norwegian Ortolan Bunting (Emberiza Hortulana) vocalizations using traditional human speech processing methods. Hidden Markov Models (HMMs) are used for both tasks, with features including Mel-Frequency Cepstral Coefficients (MFCCs), log energy, and delta (velocity) and delta-delta (acceleration) coefficients. Vocalizations were tested using leave-one-out cross-validation. Classification accuracy for 5 song-types is 92.4%… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 29 publications
(26 citation statements)
references
References 7 publications
(9 reference statements)
0
26
0
Order By: Relevance
“…Indeed the HMM system based on the feature set in Table III presented an average identification rate of 95.5%. This method reached values at least similar to or even higher than the identification rates observed in automatic recognition approaches with mammals' (Campbell et al, 2000;Clemins, 2005;Reby et al, 1997Reby et al, , 2006 and birds' (Terry and McGregor, 2002;Trawicki, 2005) vocalizations. The comparison, however, is not straightforward.…”
Section: Discussionmentioning
confidence: 51%
See 2 more Smart Citations
“…Indeed the HMM system based on the feature set in Table III presented an average identification rate of 95.5%. This method reached values at least similar to or even higher than the identification rates observed in automatic recognition approaches with mammals' (Campbell et al, 2000;Clemins, 2005;Reby et al, 1997Reby et al, , 2006 and birds' (Terry and McGregor, 2002;Trawicki, 2005) vocalizations. The comparison, however, is not straightforward.…”
Section: Discussionmentioning
confidence: 51%
“…Indeed the identification score of these sound types was too low (<10%) to consider automatic monitoring. Some heterogeneity in the recognition rates of different sound types of a species is commonly reported (Chesmore and Ohya, 2004;Jahns, 2008;Kogan and Margoliash, 1998;Parsons and Jones, 2000;Sch€ on et al, 2001;Trawicki, 2005). Several reasons may be responsible for the low identification rate of sounds (Young et al, 2006).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Based on results of previous work with song-type classification on single-channel Ortolan Bunting vocalizations [14], the analysis conditions for the distributed microphone corpus were frames of 5 ms with 50% overlap with 12 Generalized Cepstral Coefficient (GFCC) [15] features computed from the 26-channel filterbanks [16] and appended with the delta and delta-delta coefficients. The left-to-right song-type Hidden Markov Models (HMMs) [17] consisted of 18-states with a single diagonalcovariance Gaussian Mixture Model (GMM) underlying each state with approximately an equal split of the four song-types across each of the 8-channel microphones under matched training and testing conditions.…”
Section: Resultsmentioning
confidence: 99%
“…Song-type classification and speaker identification experiments [14] were performed on the Ortolan Bunting dataset. MFCCs, GFCCs, and GPLP-derived CCs.…”
Section: Ortolan Buntingmentioning
confidence: 99%