Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1078
|View full text |Cite
|
Sign up to set email alerts
|

Neural Word Segmentation with Rich Pretraining

Abstract: Neural word segmentation research has benefited from large-scale raw texts by leveraging them for pretraining character and word embeddings. On the other hand, statistical segmentation research has exploited richer sources of external information, such as punctuation, automatic segmentation and POS. We investigate the effectiveness of a range of external training sources for neural word segmentation by building a modular segmentation model, pretraining the most important submodule using rich external sources. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
70
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 104 publications
(74 citation statements)
references
References 28 publications
1
70
0
1
Order By: Relevance
“…For word-based models, segmentation is necessary. We take two segmentors with different performances, including the Jieba segmentor and the model of Yang et al (2017), which we name Jieba and YZ, respectively. To verify their accuracy, we manually segment the first 100 sentences from the test set.…”
Section: Methodsmentioning
confidence: 99%
“…For word-based models, segmentation is necessary. We take two segmentors with different performances, including the Jieba segmentor and the model of Yang et al (2017), which we name Jieba and YZ, respectively. To verify their accuracy, we manually segment the first 100 sentences from the test set.…”
Section: Methodsmentioning
confidence: 99%
“…As a result, OntoNotes is leveraged for studying oracle situations where gold segmentation is given. We use the neural word segmentor of Yang et al (2017a) to automatically segment the development and test sets for word-based NER. In particular, for the OntoNotes and MSRA datasets, we train the segmentor using gold segmentation on their respective training sets.…”
Section: Experimental Settingsmentioning
confidence: 99%
“…We benefit from this as we perform a search in the space of complete outputs and there is a combinatorial explosion in the output space for a linear increase in the input space (Doppa et al, 2014). The pretraining of the edge vectors with external knowledge in the form of morphological constraints is effective in reducing the task specific training size (Yang et al, 2017;Andor et al, 2016).…”
Section: Resultsmentioning
confidence: 99%