Perceptron-based tagging of query boundaries for Chinese query segmentation

Du, Jingfei; Yan, Shuicheng; Li, Chi-Ho

doi:10.1145/2567948.2577377

Cited by 2 publications

(4 citation statements)

References 3 publications

(3 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because "高腰" is a common word in the dress category, there are many contexts containing segment "高腰" in documents D. "腰连" is not a common word in Chinese, so few contexts can be found to support "腰连". Further, we can make a conclusion that "腰" is case (3). Note that there is no need to judge whether a context is found by the left or right bi-gram.…”

Section: Context Searchingmentioning

confidence: 99%

“…UNS(-Queries) and UNS(-Documents) are learnt without queries or external documents, respectively. As for supervised approaches, we choose three existing models, Word2Vec-LR [11] that is a simple deep learning model based on word embedding, traditional feature-based Perceptron model [3] and CRF model [24]. BiLSTM-CRF(Q) that only relies on hidden vector of characters in a query is also one of our baselines.…”

Section: Baselinesmentioning

confidence: 99%

“…As for supervised approaches, various kinds of features are designed to train SVM [2], CRF [24] and Perceptron [3] to adress query segmentation task. Different from these feature-based approaches, [11] trains a simple binary classifier to predict segmentation boundaries only based on the word embedding.…”

Section: Related Workmentioning

confidence: 99%

“…Query segmentation task has been studied extensively in research community. The existing methods can be mainly divided into three categories: unsupervised [17,15,16,4,5,21], feature-based supervised [24,3] and deep learning methods [11,12]. Unsupervised methods score each segmentation combination of a query by some kinds of statistical indexes like mutual information [17].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Distant Supervision for E-commerce Query Segmentation via Attention Network

Zhao¹,

Ding²,

Ping-hua³

et al. 2020

Preprint

View full text Add to dashboard Cite

The booming online e-commerce platforms demand highly accurate approaches to segment queries that carry the product requirements of consumers. Recent works have shown that the supervised methods, especially those based on deep learning, are attractive for achieving better performance on the problem of query segmentation. However, the lack of labeled data is still a big challenge for training a deep segmentation network, and the problem of Out-of-Vocabulary (OOV) also adversely impacts the performance of query segmentation. Different from query segmentation task in an open domain, e-commerce scenario can provide external documents that are closely related to these queries. Thus, to deal with the two challenges, we employ the idea of distant supervision and design a novel method to find contexts in external documents and extract features from these contexts. In this work, we propose a BiLSTM-CRF based model with an attention module to encode external features, such that external contexts information, which can be utilized naturally and effectively to help query segmentation. Experiments on two datasets show the effectiveness of our approach compared with several kinds of baselines.

show abstract

Section: Context Searchingmentioning

confidence: 99%

Section: Baselinesmentioning

confidence: 99%