This paper proposes a Hidden Markov Model (HMM) and an HMM-based chunk tagger, from which a named entity (NE) recognition (NER) system is built to recognize and classify names, times and numerical quantities. Through the HMM, our system is able to apply and integrate four types of internal and external evidences: 1) simple deterministic internal feature of the words, such as capitalization and digitalization; 2) internal semantic feature of important triggers; 3) internal gazetteer feature; 4) external macro context feature. In this way, the NER problem can be resolved effectively. Evaluation of our system on MUC-6 and MUC-7 English NE tasks achieves F-measures of 96.6% and 94.1% respectively. It shows that the performance is significantly better than reported by any other machine-learning system. Moreover, the performance is even consistently better than those based on handcrafted rules.
The world's coastal areas are increasingly at risk of coastal flooding due to sea-level rise (SLR). We present a novel global dataset of extreme sea levels, the Coastal Dataset for the Evaluation of Climate Impact (CoDEC), which can be used to accurately map the impact of climate change on coastal regions around the world. The third generation Global Tide and Surge Model (GTSM), with a coastal resolution of 2.5 km (1.25 km in Europe), was used to simulate extreme sea levels for the ERA5 climate reanalysis from 1979 to 2017, as well as for future climate scenarios from 2040 to 2100. The validation against observed sea levels demonstrated a good performance, and the annual maxima had a mean bias (MB) of-0.04 m, which is 50% lower than the MB of the previous GTSR dataset. By the end of the century (2071-2100), it is projected that the 1 in 10-year water levels will have increased 0.34 m on average for RCP4.5, while some locations may experience increases of up to 0.5 m. The change in return levels is largely driven by SLR, although at some locations changes in storms surges and interaction with tides amplify the impact of SLR with changes up to 0.2 m. By presenting an application of the CoDEC dataset to the city of Copenhagen, we demonstrate how climate impact indicators derived from simulation can contribute to an understanding of climate impact on a local scale. Moreover, the CoDEC output locations are designed to be used as boundary conditions for regional models, and we envisage that they will be used for dynamic downscaling.
This paper proposes a novel composite kernel for relation extraction. The composite kernel consists of two individual kernels: an entity kernel that allows for entity-related features and a convolution parse tree kernel that models syntactic information of relation examples. The motivation of our method is to fully utilize the nice properties of kernel methods to explore diverse knowledge for relation extraction. Our study illustrates that the composite kernel can effectively capture both flat and structured features without the need for extensive feature engineering, and can also easily scale to include more features. Evaluation on the ACE corpus shows that our method outperforms the previous best-reported methods and significantly outperforms previous two dependency tree kernels for relation extraction.
Extracting semantic relationships between entities is challenging. This paper investigates the incorporation of diverse lexical, syntactic and semantic knowledge in feature-based relation extraction using SVM. Our study illustrates that the base phrase chunking information is very effective for relation extraction and contributes to most of the performance improvement from syntactic aspect while additional information from full parsing gives limited further enhancement. This suggests that most of useful information in full parse trees for relation extraction is shallow and can be captured by chunking. We also demonstrate how semantic information such as WordNet and Name List, can be used in feature-based relation extraction to further improve the performance. Evaluation on the ACE corpus shows that effective incorporation of diverse features enables our system outperform previously best-reported systems on the 24 ACE relation subtypes and significantly outperforms tree kernel-based systems by over 20 in F-measure on the 5 ACE relation types.
In this paper, we propose a multi-criteriabased active learning approach and effectively apply it to named entity recognition. Active learning targets to minimize the human annotation efforts by selecting examples for labeling. To maximize the contribution of the selected e xamples, we consider the multiple criteria: informativeness, representativeness and diversity and propose measures to quantify them. More comprehensively, we incorporate all the criteria using two selection strategies, both of which result in less labeling cost than single-criterion-based method. The results of the named entity recognition in both MUC-6 and GENIA show that the labeling cost can be reduced by at least 80% without degrading the performance.
Sarcasm is a sophisticated speech act which commonly manifests on social communities such as Twitter and Reddit. The prevalence of sarcasm on the social web is highly disruptive to opinion mining systems due to not only its tendency of polarity flipping but also usage of figurative language. Sarcasm commonly manifests with a contrastive theme either between positive-negative sentiments or between literal-figurative scenarios. In this paper, we revisit the notion of modeling contrast in order to reason with sarcasm. More specifically, we propose an attention-based neural model that looks inbetween instead of across, enabling it to explicitly model contrast and incongruity. We conduct extensive experiments on six benchmark datasets from Twitter, Reddit and the Internet Argument Corpus. Our proposed model not only achieves stateof-the-art performance on all datasets but also enjoys improved interpretability.
A demo system is available at http://textmining.i2r.a-star.edu.sg/NLS/demo.htm. Technology license is available upon the bilateral agreement.
Short Messaging Service (SMS) texts behave quite differently from normal written texts and have some very special phenomena. To translate SMS texts, traditional approaches model such irregularities directly in Machine Translation (MT). However, such approaches suffer from
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.