Spoken Language Identification Using Deep Learning

Singh, Gundeep; Sharma, Sahil; Kumar, Vijay; Kaur, Manjit; Baz, Mohammed; Masud, Mehedi

doi:10.1155/2021/5123671

Cited by 46 publications

(22 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More recent works on automatic language ID use deep-learning based methods to train neural networks as acoustic models. One approach is to treat the problem of language identification as a computer vision classification problem, and thus to train a CNN on spectrograms labeled with their corresponding languages [8]. This approach, though it meets with some success in limited scenarios, must generate unnecessary intermediate images and disregards decades of progress on acoustic feature extraction and acoustic model generation.…”

Section: Related Workmentioning

confidence: 99%

Automatic Spoken Language Identification using a Time-Delay Neural Network

Kepecs,

Beigi

2022

Preprint

View full text Add to dashboard Cite

Closed-set spoken language identification is the task of recognizing the language being spoken in a recorded audio clip from a set of known languages. In this study, a language identification system was built and trained to distinguish between Arabic, Spanish, French, and Turkish based on nothing more than recorded speech. A preexisting multilingual dataset was used to train a series of acoustic models based on the Tedlium TDNN model to perform automatic speech recognition. The system was provided with a custom multilingual language model and a specialized pronunciation lexicon with language names prepended to phones. The trained model was used to generate phone alignments to test data from all four languages, and languages were predicted based on a voting scheme choosing the most common language prepend in an utterance. Accuracy was measured by comparing predicted languages to known languages, and was determined to be very high in identifying Spanish and Arabic, and somewhat lower in identifying Turkish and French.

show abstract

Section: Related Workmentioning

confidence: 99%

Automatic Spoken Language Identification using a Time-Delay Neural Network

Kepecs,

Beigi

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In our day communication between machines and humans is not only done by the graphical user interface (GUI) but also by natural human speech or by eyes gaze and gestures. For human-machine communication by natural human speech two mandatory and one optional technology are required: speech recognition (speech-to-text) [1] system that converts user human speech to text, speech synthesis (text-to-speech) [2] system that generates a meaningful human speech audio signal and spoken language identification [3] system that identifies currently spoken language. These three technologies require well-annotated speech corpora [4], which is a time-consuming process and requires knowledge of language grammar and lexicology.…”

Section: Introductionmentioning

confidence: 99%

ArmSpeech: Armenian Spoken Language Corpus

Baghdasaryan¹

2022

International Journal of Scientific Advances

View full text Add to dashboard Cite

The Armenian language is an independent branch of the Indo-European language family and the official language of the Republic of Armenia and the Republic of Artsakh. According to various reliable sources, an average of 3 million people in Armenia and 10-12 million people in the Armenian Diaspora use the Armenian language as their native language. The largest communities outside of Armenia are in the United States of America, Canada, the Russian Federation, the Islamic Republic of Iran, the French Republic, the Syrian Arab Republic and the Lebanese Republic. This paper presents the ArmSpeech speech corpus. ArmSpeech is a collection of annotated Armenian speech intended for natural language processing (NLP) technologies research and development. ArmSpeech is designed for speech-to-text and text-to-speech purposes but can be used in other domains also (e.g. language identification). Corpus contains 6206 high-quality audio samples: 11 hours 46 minutes and 26 seconds (11.77 hours) of annotated native Armenian speech of multiple speakers of any age, gender and accent. According to the research results, this is the most extensive Armenian speech corpus in the public domain for speech recognition, speech synthesis and spoken language identification systems.

show abstract

“…A speech corpus is a database of speech audio files and corresponding text transcriptions [1]. In natural language processing (NLP) [2] tasks such as speech recognition [3], speech synthesis [4] or spoken language identification [5], speech corpora are used to create acoustic models [6]. Also, speech corpora can be useful for linguistics research (phonetic, conversation analysis, dialectology and other fields).…”

Section: Introductionmentioning

confidence: 99%

Extended ArmSpeech: Armenian Spoken Language Corpus

Baghdasaryan¹

2022

International Journal of Scientific Advances

View full text Add to dashboard Cite

The first paper of ArmSpeech presented an annotative native Armenian speech corpus, its data collection, preprocessing and annotation processes, corpus structure and statistics. The main reason for ArmSpeech creation is to increase Armenian language research resources because according to research there are no free or paid Armenian speech corpora for speech-to-text, text-to-speech and language research. From an NLP perspective, the Armenian language is a low-resourced language despite the fact that The Armenian language is an independent branch of the Indo-European language family and the native language of 12-15 million people. ArmSpeech corpus can be used in natural language processing (NLP) research. The first release of the corpus mainly contains audio clips extracted from free-to-use audiobooks. The total duration of audio clips is 11.77 hours. ArmSpeech’s first release corpus includes 6206 audio clips of multiple speakers of any age, gender and accent. This paper intends to present the ArmSpeech extended version, which is a continuation of the previous work, includes an annotated Armenian speech, and the recording process is based on the volunteer’s voice donation principle. The paper also introduces necessary data collection, pre-processing, recording and annotation stages, final results and statistics of the corpus. The material (text) needed for the recording was collected from the articles on Armenian news websites about lifestyle, culture, sport and politics․ Recording was done by 1 female and 3 male volunteers whose native language is Armenian. The total duration of the data included in the second release is approximately 4 hours and along with the first release, the ArmSpeech corpus becomes 15.7 hours.

show abstract

Spoken Language Identification Using Deep Learning

Cited by 46 publications

References 26 publications

Automatic Spoken Language Identification using a Time-Delay Neural Network

Automatic Spoken Language Identification using a Time-Delay Neural Network

ArmSpeech: Armenian Spoken Language Corpus

Extended ArmSpeech: Armenian Spoken Language Corpus

Contact Info

Product

Resources

About