Jian Su scite author profile

This paper proposes a Hidden Markov Model (HMM) and an HMM-based chunk tagger, from which a named entity (NE) recognition (NER) system is built to recognize and classify names, times and numerical quantities. Through the HMM, our system is able to apply and integrate four types of internal and external evidences: 1) simple deterministic internal feature of the words, such as capitalization and digitalization; 2) internal semantic feature of important triggers; 3) internal gazetteer feature; 4) external macro context feature. In this way, the NER problem can be resolved effectively. Evaluation of our system on MUC-6 and MUC-7 English NE tasks achieves F-measures of 96.6% and 94.1% respectively. It shows that the performance is significantly better than reported by any other machine-learning system. Moreover, the performance is even consistently better than those based on handcrafted rules.

show abstract

A High-Resolution Global Dataset of Extreme Sea Levels, Tides, and Storm Surges, Including Future Projections

Muis

et al. 2020

View full text Add to dashboard Cite

The world's coastal areas are increasingly at risk of coastal flooding due to sea-level rise (SLR). We present a novel global dataset of extreme sea levels, the Coastal Dataset for the Evaluation of Climate Impact (CoDEC), which can be used to accurately map the impact of climate change on coastal regions around the world. The third generation Global Tide and Surge Model (GTSM), with a coastal resolution of 2.5 km (1.25 km in Europe), was used to simulate extreme sea levels for the ERA5 climate reanalysis from 1979 to 2017, as well as for future climate scenarios from 2040 to 2100. The validation against observed sea levels demonstrated a good performance, and the annual maxima had a mean bias (MB) of-0.04 m, which is 50% lower than the MB of the previous GTSR dataset. By the end of the century (2071-2100), it is projected that the 1 in 10-year water levels will have increased 0.34 m on average for RCP4.5, while some locations may experience increases of up to 0.5 m. The change in return levels is largely driven by SLR, although at some locations changes in storms surges and interaction with tides amplify the impact of SLR with changes up to 0.2 m. By presenting an application of the CoDEC dataset to the city of Copenhagen, we demonstrate how climate impact indicators derived from simulation can contribute to an understanding of climate impact on a local scale. Moreover, the CoDEC output locations are designed to be used as boundary conditions for regional models, and we envisage that they will be used for dynamic downscaling.

show abstract

A composite kernel to extract relations between entities with both flat and structured features

Zhang

et al. 2006

147

148

View full text Add to dashboard Cite

This paper proposes a novel composite kernel for relation extraction. The composite kernel consists of two individual kernels: an entity kernel that allows for entity-related features and a convolution parse tree kernel that models syntactic information of relation examples. The motivation of our method is to fully utilize the nice properties of kernel methods to explore diverse knowledge for relation extraction. Our study illustrates that the composite kernel can effectively capture both flat and structured features without the need for extensive feature engineering, and can also easily scale to include more features. Evaluation on the ACE corpus shows that our method outperforms the previous best-reported methods and significantly outperforms previous two dependency tree kernels for relation extraction.

show abstract

Exploring various knowledge in relation extraction

et al. 2005

View full text Add to dashboard Cite

Extracting semantic relationships between entities is challenging. This paper investigates the incorporation of diverse lexical, syntactic and semantic knowledge in feature-based relation extraction using SVM. Our study illustrates that the base phrase chunking information is very effective for relation extraction and contributes to most of the performance improvement from syntactic aspect while additional information from full parsing gives limited further enhancement. This suggests that most of useful information in full parse trees for relation extraction is shallow and can be captured by chunking. We also demonstrate how semantic information such as WordNet and Name List, can be used in feature-based relation extraction to further improve the performance. Evaluation on the ACE corpus shows that effective incorporation of diverse features enables our system outperform previously best-reported systems on the 24 ACE relation subtypes and significantly outperforms tree kernel-based systems by over 20 in F-measure on the 5 ACE relation types.

show abstract

Multi-criteria-based active learning for named entity recognition

et al. 2004

View full text Add to dashboard Cite

In this paper, we propose a multi-criteriabased active learning approach and effectively apply it to named entity recognition. Active learning targets to minimize the human annotation efforts by selecting examples for labeling. To maximize the contribution of the selected e xamples, we consider the multiple criteria: informativeness, representativeness and diversity and propose measures to quantify them. More comprehensively, we incorporate all the criteria using two selection strategies, both of which result in less labeling cost than single-criterion-based method. The results of the named entity recognition in both MUC-6 and GENIA show that the labeling cost can be reduced by at least 80% without degrading the performance.

show abstract

Reasoning with Sarcasm by Reading In-Between

Tay¹,

Luu²,

Hui³

et al. 2018

112

113

View full text Add to dashboard Cite

Sarcasm is a sophisticated speech act which commonly manifests on social communities such as Twitter and Reddit. The prevalence of sarcasm on the social web is highly disruptive to opinion mining systems due to not only its tendency of polarity flipping but also usage of figurative language. Sarcasm commonly manifests with a contrastive theme either between positive-negative sentiments or between literal-figurative scenarios. In this paper, we revisit the notion of modeling contrast in order to reason with sarcasm. More specifically, we propose an attention-based neural model that looks inbetween instead of across, enabling it to explicitly model contrast and incongruity. We conduct extensive experiments on six benchmark datasets from Twitter, Reddit and the Internet Argument Corpus. Our proposed model not only achieves stateof-the-art performance on all datasets but also enjoys improved interpretability.

show abstract

Recognizing names in biomedical texts: a machine learning approach

Zhou¹,

Zhang²,

Su³

et al. 2004

192

View full text Add to dashboard Cite

show abstract

A phrase-based statistical model for SMS text normalization

et al. 2006

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jian Su

Named entity recognition using an HMM-based chunk tagger

A High-Resolution Global Dataset of Extreme Sea Levels, Tides, and Storm Surges, Including Future Projections

A composite kernel to extract relations between entities with both flat and structured features

Exploring various knowledge in relation extraction

Multi-criteria-based active learning for named entity recognition

Reasoning with Sarcasm by Reading In-Between

Recognizing names in biomedical texts: a machine learning approach

A phrase-based statistical model for SMS text normalization

Contact Info

Product

Resources

About