Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL - ACL '06 2006
DOI: 10.3115/1220175.1220260
|View full text |Cite
|
Sign up to set email alerts
|

Contextual dependencies in unsupervised word segmentation

Abstract: Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly outperforms the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. We also show that previous probabil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
131
0
1

Year Published

2009
2009
2019
2019

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 127 publications
(134 citation statements)
references
References 14 publications
2
131
0
1
Order By: Relevance
“…We evaluate three models of this type: local minima in transitional probability (TP); minima in TP with smoothed counts; local minima in pointwise mutual information. We then evaluate three other models which focused on finding a lexicon to fit the input corpus: a clustering model by Swingley (2005) which also uses pointwise mutual information; PARSER (Perruchet & Vinter, 1998), a memory-decay model of segmentation; and a Bayesian model in the style of Brent (1999) by Goldwater, Griffiths, and Johnson (2006).…”
Section: Introductionmentioning
confidence: 99%
“…We evaluate three models of this type: local minima in transitional probability (TP); minima in TP with smoothed counts; local minima in pointwise mutual information. We then evaluate three other models which focused on finding a lexicon to fit the input corpus: a clustering model by Swingley (2005) which also uses pointwise mutual information; PARSER (Perruchet & Vinter, 1998), a memory-decay model of segmentation; and a Bayesian model in the style of Brent (1999) by Goldwater, Griffiths, and Johnson (2006).…”
Section: Introductionmentioning
confidence: 99%
“…In (Goldwater et al, 2006) they report issues with mixing in the sampler that were overcome using annealing. In (Mochihashi et al, 2009) this issue was overcome by using a blocked sampler together with a dynamic programming approach.…”
Section: Bayesian Inferencementioning
confidence: 99%
“…In [19] they report issues with mixing in the sampler that were overcome using annealing. In [18] this issue was overcome by using a blocked sampler together with a dynamic programming approach.…”
Section: Gibbs Samplingmentioning
confidence: 99%
“…The Dirichlet process model we use in our approach is a simple model that resembles the cache models used in language modeling [19]. Intuitively, the model has two basic components: a model for generating an outcome that has already been generated at least once before, and a second model that assigns a probability to an outcome that has not yet been produced.…”
Section: Unigram Dirichlet Process Modelmentioning
confidence: 99%