Mian Du scite author profile

Mian Du

5Publications

24Citation Statements Received

80Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Helsinki

Publications

Order By: Most citations

Grouping business news stories based on salience of named entities

Escoter¹,

Pivovarova²,

et al. 2017

View full text Add to dashboard Cite

In news aggregation systems focused on broad news domains, certain stories may appear in multiple articles. Depending on the relative importance of the story, the number of versions can reach dozens or hundreds within a day. The text in these versions may be nearly identical or quite different. Linking multiple versions of a story into a single group brings several important benefits to the end-user-reducing the cognitive load on the reader, as well as signaling the relative importance of the story. We present a grouping algorithm, and explore several vector-based representations of input documents: from a baseline using keywords, to a method using salience-a measure of importance of named entities in the text. We demonstrate that features beyond keywords yield substantial improvements, verified on a manually-annotated corpus of business news stories.

show abstract

Supervised Classification Using Balanced Training

Pierce

Pivovarova

et al. 2014

View full text Add to dashboard Cite

We examine supervised learning for multi-class, multi-label text classification. We are interested in exploring classification in a realworld setting, where the distribution of labels may change dynamically over time. First, we compare the performance of an array of binary classifiers trained on the label distribution found in the original corpus against classifiers trained on balanced data, where we try to make the label distribution as nearly uniform as possible. We discuss the performance tradeoffs between balanced vs. unbalanced training, and highlight the advantages of balancing the training set. Second, we compare the performance of two classifiers, Naive Bayes and SVM, with several feature-selection methods, using balanced training. We combine a Named-Entity-based rote classifier with the statistical classifiers to obtain better performance than either method alone.

show abstract

Building Support Tools for Russian-Language Information Extraction

Etter

Копотев

et al. 2011

View full text Add to dashboard Cite

Improving Supervised Classification Using Information Extraction

Pierce

Pivovarova

et al. 2015

View full text Add to dashboard Cite

Abstract. We explore supervised learning for multi-class, multi-label text classification, focusing on real-world settings, where the distribution of labels changes dynamically over time. We use the PULS Information Extraction system to collect information about the distribution of class labels over named entities found in text. We then combine a knowledge-based rote classifier with statistical classifiers to obtain better performance than either classification method alone. The resulting classifier yields a significant improvement in macro-averaged F-measure compared to the state of the art, while maintaining comparable micro-average.

show abstract

Techniques for Multilingual Security-Related Event Extraction from Online News

Atkinson

Piskorski³

et al. 2013

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mian Du

Grouping business news stories based on salience of named entities

Supervised Classification Using Balanced Training

Building Support Tools for Russian-Language Information Extraction

Improving Supervised Classification Using Information Extraction

Techniques for Multilingual Security-Related Event Extraction from Online News

Contact Info

Product

Resources

About