Proceedings of the COLING/ACL on Main Conference Poster Sessions - 2006
DOI: 10.3115/1273073.1273118
|View full text |Cite
|
Sign up to set email alerts
|

A collaborative framework for collecting Thai unknown words from the web

Abstract: We propose a collaborative framework for collecting Thai unknown words found on Web pages over the Internet. Our main goal is to design and construct a Webbased system which allows a group of interested users to participate in constructing a Thai unknown-word open dictionary. The proposed framework provides supporting algorithms and tools for automatically identifying and extracting unknown words from Web pages of given URLs. The system yields the result of unknownword candidates which are presented to the use… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2008
2008
2020
2020

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 8 publications
0
5
0
Order By: Relevance
“…The longest matching and maximum matching algorithms are plagued with problems related to dictionary size and unknown words. Researchers have attempted to cope with the unknown -for example, Haruechaiyasak and his colleagues have proposed the unknown word collecting framework [10]. Theirs is an integrative framework that extracts unrecognizable words, which are then reviewed and corrected by humans before being added to a dictionary.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The longest matching and maximum matching algorithms are plagued with problems related to dictionary size and unknown words. Researchers have attempted to cope with the unknown -for example, Haruechaiyasak and his colleagues have proposed the unknown word collecting framework [10]. Theirs is an integrative framework that extracts unrecognizable words, which are then reviewed and corrected by humans before being added to a dictionary.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The first is the unknown word problem. Unknown words are words which are not included in the dictionary for processing the texts [2]. The second problem is due to the ambiguity while parsing the texts.…”
Section: Introductionmentioning
confidence: 99%
“…As a more recent work, Haruechaiyasak et al [19] proposed a semi-automated framework that utilized statistical and corpus-based concepts for detecting unknown words and then introduced a collaborative framework among a group of corpus builders to refine the obtained results. In the automated process, unknown word boundaries are identified using frequencies of strings.…”
Section: Previous Workmentioning
confidence: 99%
“…In the past, most previous works on Thai unknown word recognition [2], [15], [19] treated unknown word candidates independently. However, in the real situation, a set of candidates generated from an unregistered portion, should be considered dependently and treated as a group.…”
Section: Unknown Word Identificationmentioning
confidence: 99%
See 1 more Smart Citation