Semi-supervised acoustic model training for speech with code-switching

Yılmaz, Emre; McLaren, Mitchell; Heuvel, H. van den; Leeuwen, David A. van

doi:10.1016/j.specom.2018.10.006

Cited by 21 publications

(25 citation statements)

References 64 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We refer to this automatically transcribed data as the 'Frisian Broadcast' data. The automatic transcription procedure is detailed in [17].…”

Section: Acoustic Datamentioning

confidence: 99%

“…The acoustic data augmentation relies on available monolingual acoustic resources from the high-resourced mixed language (Dutch). Using more monolingual Dutch speech for acoustic model training has found to be effective in improving the general ASR performance, only after increasing the in-domain CS data applying the semi-supervised techniques described in [17,21].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-Graph Decoding for Code-Switching ASR

et al. 2019

Self Cite

View full text Add to dashboard Cite

In the FAME! Project, a code-switching (CS) automatic speech recognition (ASR) system for Frisian-Dutch speech is developed that can accurately transcribe the local broadcaster's bilingual archives with CS speech. This archive contains recordings with monolingual Frisian and Dutch speech segments as well as Frisian-Dutch CS speech, hence the recognition performance on monolingual segments is also vital for accurate transcriptions. In this work, we propose a multi-graph decoding and rescoring strategy using bilingual and monolingual graphs together with a unified acoustic model for CS ASR. The proposed decoding scheme gives the freedom to design and employ alternative search spaces for each (monolingual or bilingual) recognition task and enables the effective use of monolingual resources of the high-resourced mixed language in low-resourced CS scenarios. In our scenario, Dutch is the high-resourced and Frisian is the low-resourced language. We therefore use additional monolingual Dutch text resources to improve the Dutch language model (LM) and compare the performance of single-and multi-graph CS ASR systems on Dutch segments using larger Dutch LMs. The ASR results show that the proposed approach outperforms baseline single-graph CS ASR systems, providing better performance on the monolingual Dutch segments without any accuracy loss on monolingual Frisian and code-mixed segments.

show abstract

“…We refer to this automatically transcribed data as the 'Frisian Broadcast' data. The automatic transcription procedure is detailed in [17].…”

Section: Acoustic Datamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Multi-Graph Decoding for Code-Switching ASR

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…This project developed an ASR system for Frisian–Dutch code-switching speech, as extracted from the archives of a local broadcaster. The goal of the system was to allow automatically retrieving relevant items from a large collection of news broadcasts, in response to user-specified text queries (Yilmaz et al, 2018 , p. 12). Similarly, Van Den Heuvel et al ( 2012 ) report applying ASR to disclose—via keyword retrieval−250 interviews with veterans of Dutch conflicts and military missions.…”

Section: Speech Technology To the Rescue?mentioning

confidence: 99%

Clearing the Transcription Hurdle in Dialect Corpus Building: The Corpus of Southern Dutch Dialects as Case Study

Ghyselen¹,

Breitbarth²,

Farasyn³

et al. 2020

Front. Artif. Intell.

View full text Add to dashboard Cite

This paper discusses how the transcription hurdle in dialect corpus building can be cleared. While corpus analysis has strongly gained in popularity in linguistic research, dialect corpora are still relatively scarce. This scarcity can be attributed to several factors, one of which is the challenging nature of transcribing dialects, given a lack of both orthographic norms for many dialects and speech technological tools trained on dialect data. This paper addresses the questions (i) how dialects can be transcribed efficiently and (ii) whether speech technological tools can lighten the transcription work. These questions are tackled using the Southern Dutch dialects (SDDs) as case study, for which the usefulness of automatic speech recognition (ASR), respeaking, and forced alignment is considered. Tests with these tools indicate that dialects still constitute a major speech technological challenge. In the case of the SDDs, the decision was made to use speech technology only for the word-level segmentation of the audio files, as the transcription itself could not be sped up by ASR tools. The discussion does however indicate that the usefulness of ASR and other related tools for a dialect corpus project is strongly determined by the sound quality of the dialect recordings, the availability of statistical dialect-specific models, the degree of linguistic differentiation between the dialects and the standard language, and the goals the transcripts have to serve.

show abstract

“…Emre Yılmaz et al [22] developed a semi supervised acoustic model for speech recognition. This developed model assigns language label to speech signals.…”

Section: Types Of Asr Systemsmentioning

confidence: 99%

Automatic Speech Recognition Systems for Regional Languages in India

2019

ijrte

View full text Add to dashboard Cite

Speech recognition systems has made remarkable progress in last ¬few decades such as Siri, Google assistant, Cortana. For improving the automation in services of all sectors including medical, agriculture, voice dialling, directory services, education, automobile etc., ASR systems must be built for regional languages as most of the Indian population in not familiar with English. Lots of work is done for English language but not for regional languages in India. Developing ASR and ASU systems will change the scenario of current service sector. There are many challenges in building ASR system, Noise reduction is a one of the challenging and still unsolved parameters which affects a lot on performance of any ASR system. Basically, three models required for building any ASR systems- Language model, acoustic model and pronunciation model. In this paper, discussed various parameters affecting on building ASR systems, development of ASR systems, Tools and Techniques used for building an ASR system and research on regional languages ASR system. Deep Neural network (DNN) provides a better way of recognising a speech and accuracy is high.

show abstract

Semi-supervised acoustic model training for speech with code-switching

Cited by 21 publications

References 64 publications

Multi-Graph Decoding for Code-Switching ASR

Multi-Graph Decoding for Code-Switching ASR

Clearing the Transcription Hurdle in Dialect Corpus Building: The Corpus of Southern Dutch Dialects as Case Study

Automatic Speech Recognition Systems for Regional Languages in India

Contact Info

Product

Resources

About