Proceedings of the 23rd International Conference on World Wide Web 2014
DOI: 10.1145/2567948.2577377
|View full text |Cite
|
Sign up to set email alerts
|

Perceptron-based tagging of query boundaries for Chinese query segmentation

Abstract: Query boundaries carry useful information for query segmentation, especially when analyzing queries in a language with no space, e.g., Chinese. This paper presents our research on Chinese query segmentation via averaged perceptron to model query boundaries through an L-R tagging scheme on a large amount of unlabeled queries. Experimental results indicate that query boundaries are very informative and they significantly improve supervised Chinese query segmentation when labeled training data is very limited.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 3 publications
(3 reference statements)
0
4
0
Order By: Relevance
“…Because "高腰" is a common word in the dress category, there are many contexts containing segment "高腰" in documents D. "腰连" is not a common word in Chinese, so few contexts can be found to support "腰连". Further, we can make a conclusion that "腰" is case (3). Note that there is no need to judge whether a context is found by the left or right bi-gram.…”
Section: Context Searchingmentioning
confidence: 99%
See 3 more Smart Citations
“…Because "高腰" is a common word in the dress category, there are many contexts containing segment "高腰" in documents D. "腰连" is not a common word in Chinese, so few contexts can be found to support "腰连". Further, we can make a conclusion that "腰" is case (3). Note that there is no need to judge whether a context is found by the left or right bi-gram.…”
Section: Context Searchingmentioning
confidence: 99%
“…UNS(-Queries) and UNS(-Documents) are learnt without queries or external documents, respectively. As for supervised approaches, we choose three existing models, Word2Vec-LR [11] that is a simple deep learning model based on word embedding, traditional feature-based Perceptron model [3] and CRF model [24]. BiLSTM-CRF(Q) that only relies on hidden vector of characters in a query is also one of our baselines.…”
Section: Baselinesmentioning
confidence: 99%
See 2 more Smart Citations