Nongnuch Ketui scite author profile

Due to lack of a word/phrase/sentence boundary, summarization of Thai multiple documents has several challenges in unit segmentation, unit selection, duplication elimination, and evaluation dataset construction. In this article, we introduce Thai Elementary Discourse Units (TEDUs) and their derivatives, called Combined TEDUs (CTEDUs), and then present our three-stage method of Thai multi-document summarization, that is, unit segmentation, unit-graph formulation, and unit selection and summary generation. To examine performance of our proposed method, a number of experiments are conducted using 50 sets of Thai news articles with their manually constructed reference summaries. Based on measures of ROUGE-1, ROUGE-2, and ROUGE-SU4, the experimental results show that: (1) the TEDU-based summarization outperforms paragraph-based summarization; (2) our proposed graph-based TEDU weighting with importance-based selection achieves the best performance; and (3) unit duplication consideration and weight recalculation help improve summary quality.

show abstract

A Rule-Based Method for Thai Elementary Discourse Unit Segmentation (TED-Seg)

Ketui

Theeramunkong

Onsuwan

2012

View full text Add to dashboard Cite

Discovering discourse units in Thai, a language without word and sentence boundaries, is not a straightforward task due to its high part-of-speech (POS) ambiguity and serial verb constituents. This paper introduces definitions of Thai elementary discourse units (T-EDUs), grammar rules for T-EDU segmentation and a longest-matching-based chart parser.The T-EDU definitions are used for constructing a set of context free grammar (CFG) rules. As a result, 446 CFG rules are constructed from 1,340 T-EDUs, extracted from the NEand POS-tagged corpus, Thai-NEST. These T-EDUs are evaluated with two linguists and the kappa score is 0.68. Separately, a two-level evaluation is applied; one is done in an arranged situation where a text is pre-chunked while the other is performed in a normal situation where the original running text is used for test. By specifying one grammar rule per one T-EDU instance, it is possible to make the perfect recall (100%) in a close environment when the testing corpus and the training corpus are the same, but the recall of approximately 36.16% and 31.69% are obtained for the chunked and the running texts, respectively. For an open test with 3-fold cross validation, the recall is around 67% while the precision is only 25-28%. To improve the precision score, two alternative strategies are applied; left-to-right longest matching (L2R-LM) and maximal longest matching (M-LM). The results show that in the L2R-LM and M-LM can improve the precision to 93.97% and 94.03% for the running text in the close test. However, the recall drops slightly to 94.18% and 92.91%. For the running text in the open test, the f-score improves to 57.70% and 54.14% for the L2R-LM and M-LM.

show abstract

Investigating Unit Weighting and Unit Selection Factors in Thai Multi-document Summarization

Ketui

Theeramunkong

2016

View full text Add to dashboard Cite

The Effect of Beta-Carotene contain in The Pumpkin using IoT Technology in Polyhouse

Homjun

Namkane

Kaewsirirung

et al. 2022

View full text Add to dashboard Cite

Discovering of Personal Name Prefix Patterns in Thai Researcher Corpus and Its Application

Ketui

Tongtep

Theeramunkong

2020

View full text Add to dashboard Cite

Using Similarity Measurement based on Image Feature for Gesell Figure Drawing Test Application

Homjun

Fankam-ai

Kaewsirirung

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nongnuch Ketui

Item-based approach for online exam performance and its application

Using Classification Data Mining Techniques for Students Performance Prediction

An EDU-Based Approach for Thai Multi-Document Summarization and Its Application

A Rule-Based Method for Thai Elementary Discourse Unit Segmentation (TED-Seg)

Investigating Unit Weighting and Unit Selection Factors in Thai Multi-document Summarization

The Effect of Beta-Carotene contain in The Pumpkin using IoT Technology in Polyhouse

Discovering of Personal Name Prefix Patterns in Thai Researcher Corpus and Its Application

Using Similarity Measurement based on Image Feature for Gesell Figure Drawing Test Application

Contact Info

Product

Resources

About