Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confere 2015
DOI: 10.3115/v1/p15-1171
|View full text |Cite
|
Sign up to set email alerts
|

Inducing Word and Part-of-Speech with Pitman-Yor Hidden Semi-Markov Models

Abstract: We propose a nonparametric Bayesian model for joint unsupervised word segmentation and part-of-speech tagging from raw strings. Extending a previous model for word segmentation, our model is called a Pitman-Yor Hidden Semi-Markov Model (PYHSMM) and considered as a method to build a class n-gram language model directly from strings, while integrating character and word level information. Experimental results on standard datasets on Japanese, Chinese and Thai revealed it outperforms previous results to yield the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 25 publications
(21 citation statements)
references
References 14 publications
0
21
0
Order By: Relevance
“…The natural language processing tasks include (1) Part-of-Speech Tagging (POS-Tag): Part-of-Speech (POS) tagging is an important and highly competitive task in natural language processing. We use the standard benchmark dataset in prior work [5,40], which is derived from raw features in total. The evaluation metric is balanced F-score.…”
Section: Tasksmentioning
confidence: 99%
“…The natural language processing tasks include (1) Part-of-Speech Tagging (POS-Tag): Part-of-Speech (POS) tagging is an important and highly competitive task in natural language processing. We use the standard benchmark dataset in prior work [5,40], which is derived from raw features in total. The evaluation metric is balanced F-score.…”
Section: Tasksmentioning
confidence: 99%
“…The model proposed in this paper has a close connection to unsupervised word segmentation and part-of-speech (POS) induction [6]. A key difference is that, while they use characters as the unit for the input sequence, we utilize word sequences.…”
Section: Unsupervised Word Segmentation and Part-of-speech Inductionmentioning
confidence: 99%
“…Uchiumi et al [6] can be seen as an extension to Mochihashi et al [35], who focused on unsupervised word segmentation. They proposed a nonparametric Bayesian n-gram language model based on Pitman-Yor processes.…”
Section: Unsupervised Word Segmentation and Part-of-speech Inductionmentioning
confidence: 99%
See 2 more Smart Citations