Speaker Recognition Benchmark Using the CHiME-5 Corpus

Garcia‐Romero, Daniel; Snyder, David; Watanabe, Shinji; Sell, Gregory; McCree, Alan; Povey, Daniel; Khudanpur, Sanjeev

doi:10.21437/interspeech.2019-2174

Cited by 7 publications

(2 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MFCC (Davis & Mermelstein, 1980)는 저주파 대역에서의 오디 오 신호를 세밀하게 표현하고 고주파 대역에서는 상대적으로 소 략하게 표현하는 특성을 가진다. MFCC는 방언 분류 (Chowdhury et al, 2020;Khurana et al, 2017;Mukherjee et al, 2020;Tawaqal & Suyanto, 2021;Wan et al, 2022;Wang et al, 2021;Zhang & Hansen, 2018)뿐만 아니라 음성인식 (Shahnawazuddin et al, 2016;Tüske et al, 2014;Wallington et al, 2021), 화자 인식 (Fenu et al, 2020;Garcia-Romero et al, 2019;Lin & Mak, 2020;Pappagari et al, 2020) 그리고 감정 인식 (Keesing et al, 2021;Likitha et al, 2017;Sarma et al, 2018;Saste & Jagdale, 2017;Seo & Lee, 2022)과 같은 음성 분석의 다양한 분야에서도 활용된다.…”

Section: Mel-frequency Cepstral Coefficients(mfcc)unclassified

Dialect classification based on the speed and the pause of speech utterances*

Junyeong¹,

Lee²

2023

Phonetics Speech Sci.

View full text Add to dashboard Cite

In this paper, we propose an approach for dialect classification based on the speed and pause of speech utterances as well as the age and gender of the speakers. Dialect classification is one of the important techniques for speech analysis. For example, an accurate dialect classification model can potentially improve the performance of speaker or speech recognition. According to previous studies, research based on deep learning using Mel-Frequency Cepstral Coefficients (MFCC) features has been the dominant approach. We focus on the acoustic differences between regions and conduct dialect classification based on the extracted features derived from the differences. In this paper, we propose an approach of extracting underexplored additional features, namely the speed and the pauses of speech utterances along with the metadata including the age and the gender of the speakers. Experimental results show that our proposed approach results in higher accuracy, especially with the speech rate feature, compared to the method only using the MFCC features. The accuracy improved from 91.02% to 97.02% compared to the previous method that only used MFCC features, by incorporating all the proposed features in this paper.

show abstract

Section: Mel-frequency Cepstral Coefficients(mfcc)unclassified

Dialect classification based on the speed and the pause of speech utterances*

Junyeong¹,

Lee²

2023

Phonetics Speech Sci.

View full text Add to dashboard Cite

show abstract

“…In the past few decades, machine learning, especially deep learning, has achieved remarkable breakthroughs in a wide range of speech tasks, e.g., speech recognition [1,2], speaker verification [3,4,5], language identification [6,7] and emotion classification [8,9]. Each speech task has its own specific techniques in achieving the state-of-the-art results [3,6,8,10,11,12], which require efforts of a large number of experts. Thus, it is very difficult to switch between different speech tasks without human efforts.…”

Section: Introductionmentioning

confidence: 99%

AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification

Wang¹,

et al. 2020

Preprint

View full text Add to dashboard Cite

The AutoSpeech challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to speech processing tasks. These tasks, which cover a large variety of domains, will be shown to the automated system in a random order. Each time when the tasks are switched, the information of the new task will be hinted with its corresponding training set. Thus, every submitted solution should contain an adaptation routine which adapts the system to the new task. Compared to the first edition, the 2020 edition includes advances of 1) more speech tasks, 2) noisier data in each task, 3) a modified evaluation metric. This paper outlines the challenge and describe the competition protocol, datasets, evaluation metric, starting kit, and baseline systems.

show abstract