An overview of the SemEval-2 Japanese WSD task is presented. The new characteristics of our task are (1) the task will use the first balanced Japanese sense-tagged corpus, and (2) the task will take into account not only the instances that have a sense in the given set but also the instances that have a sense that cannot be found in the set. It is a lexical sample task, and word senses are defined according to a Japanese dictionary, the Iwanami Kokugo Jiten. This dictionary and a training corpus were distributed to participants. The number of target words was 50, with 22 nouns, 23 verbs, and 5 adjectives. Fifty instances of each target word were provided, consisting of a total of 2,500 instances for the evaluation. Nine systems from four organizations participated in the task.
We compared two methods to annotate a corpus via non-expert annotators for named entity (NE) recognition task, which are (1) revising the results of the existing NE recognizer and (2) annotating NEs only by hand. We investigated the annotation time, the degrees of agreement, and the performances based on the gold standard. As we have two annotators for one file of each method, we evaluated the two performances, which are the averaged performances over the two annotators and the performances deeming the annotations correct when either of them is correct. The experiments revealed that the semi-automatic annotation was faster and showed better agreements and higher performances on average. However they also indicated that sometimes fully manual annotation should be used for some texts whose genres are far from its training data. In addition, the experiments using the annotated corpora via semi-automatic and fully manual annotation as training data for machine learning indicated that the F-measures sometimes could be better for some texts when we used manual annotation than when we used semi-automatic annotation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.