2019
DOI: 10.14569/ijacsa.2019.0100163
|View full text |Cite
|
Sign up to set email alerts
|

Developing an Adaptive Language Model for Bahasa Indonesia

Abstract: A language model is one of the important components in a speech recognition system. It is commonly developed using a statistical method called n-gram. However, a standard n-gram cannot be used for general domains with so many ambiguous semantics of sentences. This paper focuses on developing an adaptive n-gram language model for Bahasa Indonesia. First, a text corpus of ten million distinct sentences is crawled from hundreds of websites of news, magazines, personal blogs, and writing forums. The text corpus is… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…Again, this word is assigned to another topic, and its probabilistic score is calculated; after this iterative process, the list of words of each topic with a probability of belonging to a particular topic is obtained. For example [29] argues that these words tend to coexist in the same context, and words with high frequency have more significant positions in each topic. In the LDA model, a set of documents share the same topics, but the proportions are different for each document.…”
Section: Data Tagging and Trainingmentioning
confidence: 99%
“…Again, this word is assigned to another topic, and its probabilistic score is calculated; after this iterative process, the list of words of each topic with a probability of belonging to a particular topic is obtained. For example [29] argues that these words tend to coexist in the same context, and words with high frequency have more significant positions in each topic. In the LDA model, a set of documents share the same topics, but the proportions are different for each document.…”
Section: Data Tagging and Trainingmentioning
confidence: 99%
“…In Indonesian, the same intonation can give different meanings depending on the topic domain of the word or term. For example, the Indonesian greeting "kemeja" with the same intonation can be written as "ke meja" (go to the table) or "kemeja" (a dress) [28].…”
Section: ) Stemmingmentioning
confidence: 99%
“…It is important not just in some researches but also in many linguistics-based applications. It is generally used in speech recognition [1] [2], speech synthesis [3] [4] [5], emotion classification [6] [7], speaker's dialect identification [8], speaking rate estimation [9], speaking proficiency scoring [10], word count estimation [11], phonemicization [12] [13], collecting a minimum sentence set in developing speech corpus, as described in [14] [15] [16], etc.…”
Section: Introductionmentioning
confidence: 99%