2015
DOI: 10.1007/978-3-319-25207-0_50
|View full text |Cite
|
Sign up to set email alerts
|

Overview of the NLPCC 2015 Shared Task: Chinese Word Segmentation and POS Tagging for Micro-blog Texts

Abstract: In this paper, we give an overview for the shared task at the 4th CCF Conference on Natural Language Processing & Chinese Computing (NLPCC 2015): Chinese word segmentation and part-of-speech (POS) tagging for micro-blog texts. Different with the popular used newswire datasets, the dataset of this shared task consists of the relatively informal micro-texts. The shared task has two sub-tasks: (1) individual Chinese word segmentation and (2) joint Chinese word segmentation and POS Tagging. Each subtask has three… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(4 citation statements)
references
References 5 publications
(3 reference statements)
0
3
0
Order By: Relevance
“…We use the NLPCC 2015 dataset 1 (Qiu et al, 2015) to evaluate our model on micro-blog texts. The NLPCC 2015 data are provided by the shared task in the 4th CCF Conference on Natural Language Processing & Chinese Computing (NLPCC 2015): Chinese Word Segmentation and POS Tagging for micro-blog Text.…”
Section: Datasetmentioning
confidence: 99%
“…We use the NLPCC 2015 dataset 1 (Qiu et al, 2015) to evaluate our model on micro-blog texts. The NLPCC 2015 data are provided by the shared task in the 4th CCF Conference on Natural Language Processing & Chinese Computing (NLPCC 2015): Chinese Word Segmentation and POS Tagging for micro-blog Text.…”
Section: Datasetmentioning
confidence: 99%
“…It is important to study the social sentiment analysis methods for Weibo, and the Weibo text corpus is an important data set for analyzing people's views on the latest events. Unlike long, standard texts, the Weibo corpus is a relatively informal text with a preference for colloquial speech and short length [9]. Yao et al [3] applied the corpus to organize the 2nd CCF Conference on Natural Language Processing & Chinese Computing (NLP&CC 2013) Chinese Weibo sentiment analysis evaluation, which strongly promoted the research on Weibo sentiment analysis.…”
Section: Related Workmentioning
confidence: 99%
“…After that, the anti-word set is used to create the AP features for CRFs models by calculating the AP value of the current observed token according to Eq. (8). The AP value is also discretized for feeding the CRFs model in accordance with the following scheme.…”
Section: Ce(token)·ce(chara)mentioning
confidence: 99%
“…The training and test corpora are released by NLPCC 2015 for the shared task of microblog-oriented CWS [8], as shown in Table 2. In addition, we collect 300,000 unlabeled tweets (including 20 billion words) as the background corpus to extract features for the semi-supervised initial segmenter.…”
Section: Datasetsmentioning
confidence: 99%