Proceedings of the COLING/ACL on Main Conference Poster Sessions - 2006
DOI: 10.3115/1273073.1273196
|View full text |Cite
|
Sign up to set email alerts
|

Subword-based tagging for confidence-dependent Chinese word segmentation

Abstract: We proposed a subword-based tagging for Chinese word segmentation to improve the existing character-based tagging. The subword-based tagging was implemented using the maximum entropy (MaxEnt) and the conditional random fields (CRF) methods. We found that the proposed subword-based tagging outperformed the character-based tagging in all comparative experiments. In addition, we proposed a confidence measure approach to combine the results of a dictionary-based and a subword-tagging-based segmentation. This appro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
1

Year Published

2008
2008
2017
2017

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(21 citation statements)
references
References 10 publications
0
20
1
Order By: Relevance
“…• CRF: the approach which uses CRF model, which has shown good performance in Chinese word segmentation tasks [10], [11]. We use linear-chain CRFs † and the standard BIO labels.…”
Section: Resultsmentioning
confidence: 99%
“…• CRF: the approach which uses CRF model, which has shown good performance in Chinese word segmentation tasks [10], [11]. We use linear-chain CRFs † and the standard BIO labels.…”
Section: Resultsmentioning
confidence: 99%
“…Recent studies in CWS focus on tagging approaches with either characters [16,30] or words [4,33,34] as tagging units. Very little research [29] has been devoted to resolving CWS problems based on morphemes, a lower-level linguistic structure than words.…”
Section: Introductionmentioning
confidence: 99%
“…To alleviate the high OOV-ratio issue of character-based sequence labeling, Zhang et al (2006) and Zhao and Kit (2007) propose subword-based sequence labeling for word segmentation by extracting highfrequency subword and treating them as the basic labeling units. Li (2011) and Li and Zhou (2012) propose to jointly parse the internal structures of words and syntactic structure of a sentence.…”
Section: Related Workmentioning
confidence: 99%