2021
DOI: 10.1155/2021/5123671
|View full text |Cite
|
Sign up to set email alerts
|

Spoken Language Identification Using Deep Learning

Abstract: The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 46 publications
(22 citation statements)
references
References 26 publications
0
13
0
Order By: Relevance
“…More recent works on automatic language ID use deep-learning based methods to train neural networks as acoustic models. One approach is to treat the problem of language identification as a computer vision classification problem, and thus to train a CNN on spectrograms labeled with their corresponding languages [8]. This approach, though it meets with some success in limited scenarios, must generate unnecessary intermediate images and disregards decades of progress on acoustic feature extraction and acoustic model generation.…”
Section: Related Workmentioning
confidence: 99%
“…More recent works on automatic language ID use deep-learning based methods to train neural networks as acoustic models. One approach is to treat the problem of language identification as a computer vision classification problem, and thus to train a CNN on spectrograms labeled with their corresponding languages [8]. This approach, though it meets with some success in limited scenarios, must generate unnecessary intermediate images and disregards decades of progress on acoustic feature extraction and acoustic model generation.…”
Section: Related Workmentioning
confidence: 99%
“…In our day communication between machines and humans is not only done by the graphical user interface (GUI) but also by natural human speech or by eyes gaze and gestures. For human-machine communication by natural human speech two mandatory and one optional technology are required: speech recognition (speech-to-text) [1] system that converts user human speech to text, speech synthesis (text-to-speech) [2] system that generates a meaningful human speech audio signal and spoken language identification [3] system that identifies currently spoken language. These three technologies require well-annotated speech corpora [4], which is a time-consuming process and requires knowledge of language grammar and lexicology.…”
Section: Introductionmentioning
confidence: 99%
“…A speech corpus is a database of speech audio files and corresponding text transcriptions [1]. In natural language processing (NLP) [2] tasks such as speech recognition [3], speech synthesis [4] or spoken language identification [5], speech corpora are used to create acoustic models [6]. Also, speech corpora can be useful for linguistics research (phonetic, conversation analysis, dialectology and other fields).…”
Section: Introductionmentioning
confidence: 99%