We aim to develop a text mining framework capable of identifying and extracting causal dependencies among changing variables (or events) from scientific publications in the cross-disciplinary field of oceanographic climate science. The extracted information can be used to infer new knowledge or to find out unknown hypotheses through reasoning, which forms the basis of a knowledge discovery support system. Automatic extraction of causal knowledge from text content is a challenging task. Generally, the approaches of causal relation identification proposed in the literature target specific domain such as online news or biomedicine as the domain has significant influence on causality expressions found in the domain texts. Therefore, the existing models of causality extraction may not be directly portable to other/new domains. In this paper, we describe the nature of causation observed in climate science domain, review the state-of-the-art approaches in causal knowledge extraction from text and carefully select the methods and resources most likely to be applicable to the considered domain.
This paper presents our relation extraction system for subtask C of SemEval-2017 Task 10: ScienceIE. Assuming that the keyphrases are already annotated in the input data, our work explores a wide range of linguistic features, applies various feature selection techniques, optimizes the hyper parameters and class weights and experiments with different problem formulations (single classification model vs individual classifiers for each keyphrase type, single-step classifier vs pipeline classifier for hyponym relations). Performance of five popular classification algorithms are evaluated for each problem formulation along with feature selection. The best setting achieved an F 1 score of 71.0% for synonym and 30.0% for hyponym relation on the test data.
Cybersecurity risks such as malware threaten the personal safety of users, but to identify malware text is a major challenge. The paper proposes a supervised learning approach to identifying malware sentences given a document (subTask1 of SemEval 2018, Task 8), as well as to classifying malware tokens in the sentences (subTask2). The approach achieved good results, ranking second of twelve participants for both subtasks, with F-scores of 57% for subTask1 and 28% for subTask2.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.