2008 5th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technolog 2008
DOI: 10.1109/ecticon.2008.4600388
|View full text |Cite
|
Sign up to set email alerts
|

A comparative study on Thai word segmentation approaches

Abstract: In this paper, we analyze and compare various approaches for Thai word segmentation. The word segmentation approaches could be classified into two distinct types, dictionary based (DCB) and machine learning based (MLB). The DCB approach relies on a set of terms for parsing and segmenting input texts. Whereas the MLB approach relies on a model trained from a corpus by using machine learning techniques. We compare between two algorithms from the DCB approach: longest-matching and maximal matching, and four algor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0

Year Published

2009
2009
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 80 publications
(34 citation statements)
references
References 3 publications
0
29
0
Order By: Relevance
“…In indexed Thai texts using an inverted index [5], word segmentation [6] - [10] is one of the most widely used information extraction techniques in Natural Language Processing (NLP). The word segmentation technique is used to perform the index terms tokenization.…”
Section: Thai Index Terms Extractionmentioning
confidence: 99%
See 1 more Smart Citation
“…In indexed Thai texts using an inverted index [5], word segmentation [6] - [10] is one of the most widely used information extraction techniques in Natural Language Processing (NLP). The word segmentation technique is used to perform the index terms tokenization.…”
Section: Thai Index Terms Extractionmentioning
confidence: 99%
“…After which, these segmented index terms will be stored into the inverted index structure. Most techniques are based on word segmentation which usually relies on any dictionary or requires the linguistic knowledge of the language [3], [6]. However, there is some other techniques which do not rely on language analysis.…”
Section: Introductionmentioning
confidence: 99%
“…This research interested in THAI language. The best algorithm for THAI segmentation is Longest Word Matching Algorithm (7) as shown in Figure 2. Fig.…”
Section: 12mentioning
confidence: 99%
“…In the automated process, unknown word boundaries are identified using frequencies of strings. In [20], a comparison of dictionary-based approach and ML-based approach for word segmentation was presented where unknown word detection is implicitly handled. Since either of the dictionarybased and ML-based approaches has its advantages, most previous works [2], [5], [15] combined them to handle unknown words.…”
Section: Previous Workmentioning
confidence: 99%