2012 Seventh International Conference on Knowledge, Information and Creativity Support Systems 2012
DOI: 10.1109/kicss.2012.33
|View full text |Cite
|
Sign up to set email alerts
|

A Rule-Based Method for Thai Elementary Discourse Unit Segmentation (TED-Seg)

Abstract: Discovering discourse units in Thai, a language without word and sentence boundaries, is not a straightforward task due to its high part-of-speech (POS) ambiguity and serial verb constituents. This paper introduces definitions of Thai elementary discourse units (T-EDUs), grammar rules for T-EDU segmentation and a longest-matching-based chart parser.The T-EDU definitions are used for constructing a set of context free grammar (CFG) rules. As a result, 446 CFG rules are constructed from 1,340 T-EDUs, extracted f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 6 publications
0
2
0
Order By: Relevance
“…Several research has addressed automatic segmentation in several languages, such as: French [1], English [14], Portuguese [8], Spanish [3,9] and Tahi. [6]. All converge to the idea of using an explicit list of marks in order to segment texts.…”
Section: State-of-the-artmentioning
confidence: 99%
“…Several research has addressed automatic segmentation in several languages, such as: French [1], English [14], Portuguese [8], Spanish [3,9] and Tahi. [6]. All converge to the idea of using an explicit list of marks in order to segment texts.…”
Section: State-of-the-artmentioning
confidence: 99%
“…Here, "UNK" is a special type, assigned to tokens which cannot be classified into any of 24 existing types. In the past, a number of research works applied a similar tagset, such as those in [13], [21], [22]. Figure 4 illustrates an example of the tagging process in three stages.…”
Section: Taggingmentioning
confidence: 99%