Abstract. This chapter examines the nature of semantic relations and their main applications in information science. The nature and types of semantic relations are discussed from the perspectives of linguistics and psychology. An overview of the semantic relations used in knowledge structures such as thesauri and ontologies are provided, as well as the main techniques used in the automatic extraction of semantic relations from text. The chapter then reviews the use of semantic relations in information extraction, information retrieval, question-answering and automatic text summarization applications.
This paper reports the first part of a project that aims to develop a knowledge extraction and knowledge discovery system that extracts causal knowledge from textual databases. In this initial study, we develop a method to identify and extract cause-effect information that is explicitly expressed in medical abstracts in the Medline database. A set of graphical patterns were constructed that indicate the presence of a causal relation in sentences, and which part of the sentence represents the cause and which part represents the effect. The patterns are matched with the syntactic parse trees of sentences, and the parts of the parse tree that match with the slots in the patterns are extracted as the cause or the effect.
This article introduces a new general-purpose sentiment lexicon called WKWSCI Sentiment Lexicon and compares it with five existing lexicons: Hu & Liu Opinion Lexicon, Multi-perspective Question Answering (MPQA) Subjectivity Lexicon, General Inquirer, National Research Council Canada (NRC) Word-Sentiment Association Lexicon and Semantic Orientation Calculator (SO-CAL) lexicon. The effectiveness of the sentiment lexicons for sentiment categorisation at the document level and sentence level was evaluated using an Amazon product review data set and a news headlines data set. WKWSCI, MPQA, Hu & Liu and SO-CAL lexicons are equally good for product review sentiment categorisation, obtaining accuracy rates of 75%–77% when appropriate weights are used for different categories of sentiment words. However, when a training corpus is not available, Hu & Liu obtained the best accuracy with a simple-minded approach of counting positive and negative words for both document-level and sentence-level sentiment categorisation. The WKWSCI lexicon obtained the best accuracy of 69% on the news headlines sentiment categorisation task, and the sentiment strength values obtained a Pearson correlation of 0.57 with human-assigned sentiment values. It is recommended that the Hu & Liu lexicon be used for product review texts and the WKWSCI lexicon for non-review texts.
This study investigated how effectively cause-effect information can be extracted from newspaper text using a simple computational method (i.e. without knowledge-based inferencing and without full parsing of sentences). An automatic method was developed for identifying and extracting cause-effect information in Wall Street Journal text using linguistic clues and pattern-matching. The set of linguistic patterns used for identifying causal relations was based on a thorough review of the literature and on an analysis of sample sentences from Wall Street Journal. The cause-effect information extracted using the method was compared with that identified by two human judges. The program successfully extracted about 68% of the causal relations identified by both judges (the intersection of the two sets of causal relations identified by the judges). Of the instances that the computer program identified as causal relations, about 25% were identified by both judges, and 64% were identified by at least one of the judges. Problems encountered are discussed.
PurposeThe purpose of this study is to analyze the macro‐level discourse structure of literature reviews found in information science journal papers, and to identify different styles of literature review writing. Although there have been several studies of human abstracting, there are hardly any studies of how authors construct literature reviews.Design/methodology/approachThis study is carried out in the context of a project to develop a summarization system to generate literature reviews automatically. A coding scheme was developed to annotate the high‐level organization of literature reviews, focusing on the types of information. Two sets of annotations were used to check inter‐coder reliability.FindingsIt was found that literature reviews are written in two distinctive styles, with different discourse structures. Descriptive literature reviews summarize individual papers/studies and provide more information on each study, such as research methods, results and interpretation. Integrative literature reviews provide fewer details of individual papers/studies, but focus on ideas and results extracted from these papers. They provide critical summaries of topics, and have a more complex structure of topics and sub‐topics. The reviewer's voice is also more dominant.Originality/valueThe coding scheme is useful for annotating the macro‐level discourse structure of literature reviews, and can be used for studying literature reviews in other fields. The basic characteristics of two styles of literature review writing are identified. The results have provided a foundation for further studies of literature reviews – to identify discourse relations and rhetorical functions employed in literature reviews, and their linguistic expressions.
In this article, a method for automatic sentiment analysis of movie reviews is proposed, implemented and evaluated. In contrast to most studies that focus on determining only sentiment orientation (positive versus negative), the proposed method performs fine-grained analysis to determine both the sentiment orientation and sentiment strength of the reviewer towards various aspects of a movie. Sentences in review documents contain independent clauses that express different sentiments toward different aspects of a movie. The method adopts a linguistic approach of computing the sentiment of a clause from the prior sentiment scores assigned to individual words, taking into consideration the grammatical dependency structure of the clause. The prior sentiment scores of about 32,000 individual words are derived from SentiWordNet with the help of a subjectivity lexicon. Negation is delicately handled. The output sentiment scores can be used to identify the most positive and negative clauses or sentences with respect to particular movie aspects.
This paper reports a study in automatic sentiment classification, i.e., automatically classifying documents as expressing positive or negative sentiments. The study investigates the effectiveness of using a machine-learning algorithm, support vector machine (SVM), on various text features to classify on-line product reviews into recommended (positive sentiment) and not recommended (negative sentiment). In the first part of this study, several approaches, unigrams (individual words), selected words (such as verb, adjective, and adverb), and words labeled with part-of-speech tags were investigated. Using SVM, the unigram approach obtained an accuracy rate of around 76%. Error analysis suggests various approaches for improving classification accuracy: handling of negation phrases, inferencing from superficial words, and handling the problem of comments on parts of the product. The second part of the study investigated the use of negation phrase n-grams to improve classification accuracy. This approach increased the accuracy rate to 79.33%. Compared with traditional subject classification which mainly uses unigrams, syntactic and semantic processing of text appear more important for sentiment classification. We expect that deeper linguistic processing will help increase accuracy for sentiment classification. D
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.