Michael Gref scite author profile

Michael Gref

4Publications

12Citation Statements Received

79Citation Statements Given

How they've been cited

How they cite others

Affiliations

Fraunhofer Institute for Intelligent Analysis and Information Systems

Publications

Order By: Most citations

Two-Staged Acoustic Modeling Adaption for Robust Speech Recognition by the Example of German Oral History Interviews

Gref

Schmidt

Behnke

et al. 2019

View full text Add to dashboard Cite

In automatic speech recognition, often little training data is available for specific challenging tasks, but training of state-of-the-art automatic speech recognition systems requires large amounts of annotated speech. To address this issue, we propose a two-staged approach to acoustic modeling that combines noise and reverberation data augmentation with transfer learning to robustly address challenges such as difficult acoustic recording conditions, spontaneous speech, and speech of elderly people. We evaluate our approach using the example of German oral history interviews, where a relative average reduction of the word error rate by 19.3% is achieved.

show abstract

Speech Analytics in Research Based on Qualitative Interviews

et al. 2018

View full text Add to dashboard Cite

The paper presents aims and results of the project KA³ (Kölner Zentrum Analyse und Archivierung von audio-visual-Daten), in which advanced speech technologies are developed and provided to enhance the process of indexing and analysing speech recordings from the oral history domain and the language sciences. Close cooperation between speech technology scientists and digital humanities researchers is an important aspect of the project making sure that the development of the technologies answers the needs of research based on qualitative audio-visual interviews. For practical research reasons, the project focuses on the audio aspect, although visual aspects are of course equally important for the analysis of audio-visual data. The Cologne Centre for Analysis and Archiving of audio-visual data will provide the technologies as a central service.

show abstract

On the Influence of Modifying Magnitude and Phase Spectrum to Enhance Noisy Speech Signals

Hirsch¹,

Gref²

2017

View full text Add to dashboard Cite

Neural networks have proven their ability to be usefully applied as component of a speech enhancement system. This is based on the known feature of neural nets to map regions inside a feature space to other regions. It can be taken to map noisy magnitude spectra to clean spectra. This way the net can be used to substitute an adaptive filtering in the spectral domain. We set up such a system and compared its performance against a known adaptive filtering approach in terms of speech quality and in terms of recognition rate. It is a still not fully answered question how far the speech quality can be enhanced by modifying not only the magnitude but also the spectral phase and how this phase modification could be realized. Before trying to use a neural network for a possible modification of the phase spectrum we ran a set of oracle experiments to find out how far the quality can be improved by modifying the magnitude and/or the phase spectrum in voiced segments. It turns out that the simultaneous modification of magnitude and phase spectrum has the potential for a considerable improvement of the speech quality in comparison to modifying the magnitude or the phase only.

show abstract

Human and Automatic Speech Recognition Performance on German Oral History Interviews

Gref¹,

Matthiesen²,

Schmidt³

et al. 2022

Preprint

View full text Add to dashboard Cite

Automatic speech recognition systems have accomplished remarkable improvements in transcription accuracy in recent years. On some domains, models now achieve near-human performance. However, transcription performance on oral history has not yet reached human accuracy. In the present work, we investigate how large this gap between human and machine transcription still is. For this purpose, we analyze and compare transcriptions of three humans on a new oral history data set. We estimate a human word error rate of 8.7 % for recent German oral history interviews with clean acoustic conditions. For comparison with recent machine transcription accuracy, we present experiments on the adaptation of an acoustic model achieving near-human performance on broadcast speech. We investigate the influence of different adaptation data on robustness and generalization for clean and noisy oral history interviews. We optimize our acoustic models by 5 to 8 % relative for this task and achieve 23.9 % WER on noisy and 15.6 % word error rate on clean oral history interviews.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michael Gref

Two-Staged Acoustic Modeling Adaption for Robust Speech Recognition by the Example of German Oral History Interviews

Speech Analytics in Research Based on Qualitative Interviews

On the Influence of Modifying Magnitude and Phase Spectrum to Enhance Noisy Speech Signals

Human and Automatic Speech Recognition Performance on German Oral History Interviews

Contact Info

Product

Resources

About