Proceedings of the 18th Conference on Computational Linguistics - 2000
DOI: 10.3115/990820.990824
|View full text |Cite
|
Sign up to set email alerts
|

Extended models and tools for high-performance part-of-speech tagger

Abstract: Statistical part-of-st)eeeh(POS) taggers achieve high accuracy and robustness when based oil large, scale maimally tagged eorl)ora. Ilowever, enhancements of the learning models are necessary to achieve better 1)erforma.nce. We are develol)ing a learning tool for a Jalmnese morphological analyzer called Ch, aScn. Currently we use a fine-grained POS tag set with about 500 tags. To al)l)ly a normal trigram model on the tag set, we need unrealistic size of eorl)ora. Even, for a hi-gram model, we ean-no~, 1)ret)ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
0

Year Published

2005
2005
2017
2017

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 72 publications
(39 citation statements)
references
References 5 publications
0
39
0
Order By: Relevance
“…This rank was determined by a one-sided Smirnov-Grubbs test (5% significance). (2-3) Comments were decomposed into morphemes using the ChaSen (9) , nouns, verbs, and adjectives were extracted. Then these were multiplied by the evaluation value for each rank of each word.…”
Section: Recursive Evaluation Of Comments and Wordsmentioning
confidence: 99%
“…This rank was determined by a one-sided Smirnov-Grubbs test (5% significance). (2-3) Comments were decomposed into morphemes using the ChaSen (9) , nouns, verbs, and adjectives were extracted. Then these were multiplied by the evaluation value for each rank of each word.…”
Section: Recursive Evaluation Of Comments and Wordsmentioning
confidence: 99%
“…Since all conversations were done in Japanese, we analyzed the transcripts to see how actively participants spoke in each task, which is similar to counting words in English sentences. Here, the transcripts were morphologically analyzed and split into tokens by the Chasen tokenizer (Asahara & Matsumoto, 2000). The numbers of extracted tokens were counted for each participant or entity for each task.…”
Section: Methodsmentioning
confidence: 99%
“…For this purpose we extracted relation-oriented sentences for creating dictionaries as verb dictionary, noun dictionary and n-gram dictionaries using WWW corpus (1,907,086 sentences) retrieved with Larbin robot. The verbs and nouns dictionaries consist of 79,460 verbs and 134,189 nouns retrieved with help of ChaSen [8]. For creating scripts automatically, our system had to search for the relationships between verbs and nouns and also between verb pairs.…”
Section: Our Systemmentioning
confidence: 99%