As an essential component of human cognition, cause–effect relations appear frequently in text, and curating cause–effect relations from text helps in building causal networks for predictive tasks. Existing causality extraction techniques include knowledge-based, statistical machine learning (ML)-based, and deep learning-based approaches. Each method has its advantages and weaknesses. For example, knowledge-based methods are understandable but require extensive manual domain knowledge and have poor cross-domain applicability. Statistical machine learning methods are more automated because of natural language processing (NLP) toolkits. However, feature engineering is labor-intensive, and toolkits may lead to error propagation. In the past few years, deep learning techniques attract substantial attention from NLP researchers because of its powerful representation learning ability and the rapid increase in computational resources. Their limitations include high computational costs and a lack of adequate annotated training data. In this paper, we conduct a comprehensive survey of causality extraction. We initially introduce primary forms existing in the causality extraction: explicit intra-sentential causality, implicit causality, and inter-sentential causality. Next, we list benchmark datasets and modeling assessment methods for causal relation extraction. Then, we present a structured overview of the three techniques with their representative systems. Lastly, we highlight existing open challenges with their potential directions.
Data-driven knowledge acquisition is one of the key research fields in data mining. Dealing with large amounts of data has received a lot of attention in the field recently, and a number of methodologies have been proposed to extract insights from data in an automated or semi-automated manner. However, these methodologies generally target a specific aspect of the data mining process, such as data acquisition, data preprocessing, or data classification. However, a comprehensive knowledge acquisition method is crucial to support the end-to-end knowledge engineering process. In this paper, we introduce a knowledge acquisition system that covers all major phases of the cross-industry standard process for data mining. Acknowledging the importance of an end-to-end knowledge engineering process, we designed and developed an easy-to-use data-driven knowledge acquisition tool (DDKAT). The major features of the DDKAT are: (1) a novel unified features scoring approach for data selection; (2) a user-friendly data processing interface to improve the quality of the raw data; (3) an appropriate decision tree algorithm selection approach to build a classification model; and (4) the generation of production rules from various decision tree classification models in an automated manner. Furthermore, two diabetes studies were performed to assess the value of the DDKAT in terms of user experience. A total of 19 experts were involved in the first study and 102 students in the artificial intelligence domain were involved in the second study. The results showed that the overall user experience of the DDKAT was positive in terms of its attractiveness, as well as its pragmatic and hedonic quality factors.
INDEX TERMSKnowledge engineering, data mining, features ranking, algorithm selection, decision tree, production rule, user experience.
I. INTRODUCTIONKnowledge systems have come a long way, from manual knowledge curation to automatic data-driven knowledge generation. The major drivers of this transition were the size and complexity of data. Since large datasets cannot be efficiently analyzed manually, the automation process is essential [2].Initially in this process of knowledge automation, knowledge engineers followed ad-hoc procedures [3]. Later on, more systematic methodologies were devised, which can be referred to as data-driven knowledge acquisition systems.Knowledge extraction from structured sources such as databases is an active area of research in the information
Intent classification, to identify the speaker’s intention, and slot filling, to label each token with a semantic type, are critical tasks in natural language understanding. Traditionally the two tasks have been addressed independently. More recently joint models, that address the two tasks together, have achieved state-of-the-art performance for each task, and have shown there exists a strong relationship between the two. In this survey we bring the coverage of methods up to 2021 including the many applications of deep learning in the field. As well as a technological survey we look at issues addressed in the joint task, and the approaches designed to address these issues. We cover data sets, evaluation metrics, experiment design and supply a summary of reported performance on the standard data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.