Phase-Aware Signal Processing for Automatic Speech Recognition

Fahringer, Johannes; Schrank, Tobias; Stahl, Johannes; Mowlaee, Pejman; Pernkopf, Franz

doi:10.21437/interspeech.2016-823

Cited by 10 publications

(12 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Speech enhancement methods have traditionally only dealt with filtering the spectral magnitudes, however many approaches have been recently been proposed for jointly enhancing the magnitude and phase spectra [1,8,9,10,11,12,13]. The prevalent method for estimating phase spectra from given magnitudes in speech synthesis is the one proposed by Griffin and Lim [14].…”

Section: Related Workmentioning

confidence: 99%

The Conversation: Deep Audio-Visual Speech Enhancement

Afouras¹,

Chung²,

Zisserman³

2018

Interspeech 2018

301

332

View full text Add to dashboard Cite

Our goal is to isolate individual speakers from multi-talker simultaneous speech in videos. Existing works in this area have focussed on trying to separate utterances from known speakers in controlled environments. In this paper, we propose a deep audio-visual speech enhancement network that is able to separate a speaker's voice given lip regions in the corresponding video, by predicting both the magnitude and the phase of the target signal. The method is applicable to speakers unheard and unseen during training, and for unconstrained environments. We demonstrate strong quantitative and qualitative results, isolating extremely challenging real-world examples.

show abstract

Section: Related Workmentioning

confidence: 99%

The Conversation: Deep Audio-Visual Speech Enhancement

Afouras¹,

Chung²,

Zisserman³

2018

Interspeech 2018

301

332

View full text Add to dashboard Cite

show abstract

“…Além disso, pode-se ainda utilizar as estratégias discutidas em [8], [11] e [12] para o tratamento das raízes próximas a circunferência de raio unitário. Dentre elas, destacam-se as que utilizam bancos de filtros em MF.…”

Section: B Considerações Sobre a Gdfunclassified

“…Em [10], cepstros complexos são usados como atributos para a aplicação em conversores de texto para fala. Já em [5] e [8], atributos criados a partir das derivadas do espectro de fase no domínio do tempo e da frequência, representados pela frequência instantânea e pelo atraso de grupo (group delay -GD), respectivamente, são utilizados em aplicações de realce do sinal de fala e em sistemas de ASR.…”

Section: Introductionunclassified

See 1 more Smart Citation

Considerações Sobre o Uso do Sinal de Fase em Sistemas de Reconhecimento Automático de Fala

Silva¹,

Seara²

2019

Anais De XXXVII Simpósio Brasileiro De Telecomunicações E Processamento De Sinais

View full text Add to dashboard Cite

Resumo-Este trabalho apresenta uma investigação sobre o uso do espectro de fase, oriundo da transformada de Fourier (Fourier transform phase spectrum-FTPS), em sistemas de reconhecimento automático de fala (automatic speech recognition-ASR). Historicamente, em sistemas de ASR, a utilização do sinal de fase tem sido usualmente negligenciada. No entanto, pesquisas recentes têm mostrado a importância do FTPS em diversas aplicações de processamento de fala. Especificamente, visando o aprimoramento de sistemas de ASR, a função atraso de grupoé considerada na etapa de extração de atributos (frontend), bem como na etapa de construção do modelo acústico. Adicionalmente, o desempenho de sistemas de ASR, usando front-ends baseados na função atraso de grupo,é avaliado para ambientes acústicos com baixa razão sinal-ruído. Resultados de simulação obtidos aqui permitem inferir acerca do impacto da informação da fase do sinal de fala (melhoria média de 3,32%) no desempenho de sistemas de ASR. Palavras-Chave-Atraso de grupo, extração de atributos, informação da fase, reconhecimento automático de fala.

show abstract

“…Automatic speech recognition (ASR) aims to map an audio signal, containing speech, into a text transcription containing a sequence of words. Basically, the goal is to match the transcription as close as possible to the audio message, with no particular understanding of the meaning or scope of what was spoken [2].…”

Section: Introductionmentioning

confidence: 99%

On the Use of Multi-lingual Approach for a Cloud-based Transcription System for the ‘Ilonggoish’ Dialect

Alibagon¹,

Elijorde²,

Castro³

et al. 2018

IJGDC

View full text Add to dashboard Cite

The study is aimed at the development of a Transcription System for 'Ilonggoish' Dialect, which is a widely-spoken local language in the Philippines. It is a software that records speech in .wav file format, transcribes speech into text, and generates text file containing the transcribed text. The system has a built in speech recognition that has the capability to recognize pre-recorded speeches spoken in different languages such as English, Filipino, Hiligaynon, and Ilonggoish dialect. Integrated into the system are the recording tool for the input speech data, data storing capability in .wav format, and text storing capability in .txt format. This study presents an approach to extract features of the spoken words by using the Mel Frequency Cepstral Coefficients (MFCC) algorithm from speech signals of isolated spoken words, and Hidden Markov Model (HMM) method in presenting the recognized spoken words in text format. The system uses the Google Cloud's database of words as the baseline for standard words. It was evaluated by linguists specializing in Filipino, English, and Hiligaynon languages, and IT experts in different fields such as the academe and industry.

show abstract

Phase-Aware Signal Processing for Automatic Speech Recognition

Cited by 10 publications

References 34 publications

The Conversation: Deep Audio-Visual Speech Enhancement

The Conversation: Deep Audio-Visual Speech Enhancement

Considerações Sobre o Uso do Sinal de Fase em Sistemas de Reconhecimento Automático de Fala

On the Use of Multi-lingual Approach for a Cloud-based Transcription System for the ‘Ilonggoish’ Dialect

Contact Info

Product

Resources

About