Local Structure around the Amino Group of Glycine Carbamate in Concentrated Aqueous Solutions

In news articles the lead bias is a common phenomenon that usually dominates the learning signals for neural extractive summarizers, severely limiting their performance on data with different or even no bias. In this paper, we introduce a novel technique 1 to demote lead bias and make the summarizer focus more on the content semantics. Experiments on two news corpora with different degrees of lead bias show that our method can effectively demote the model's learned lead bias and improve its generality on out-ofdistribution data, with little to no performance loss on in-distribution data.

show abstract

Incorporating Metadata into Content-Based User Embeddings

Xing¹,

Paul

2017

View full text Add to dashboard Cite

Low-dimensional vector representations of social media users can benefit applications like recommendation systems and user attribute inference. Recent work has shown that user embeddings can be improved by combining different types of information, such as text and network data. We propose a data augmentation method that allows novel feature types to be used within off-the-shelf embedding models. Experimenting with the task of friend recommendation on a dataset of 5,019 Twitter users, we show that our approach can lead to substantial performance gains with the simple addition of network and geographic features.

show abstract

Predicting Above-Sentence Discourse Structure Using Distant Supervision from Topic Segmentation

Huber

Xing

Carenini

2022

AAAI

View full text Add to dashboard Cite

RST-style discourse parsing plays a vital role in many NLP tasks, revealing the underlying semantic/pragmatic structure of potentially complex and diverse documents. Despite its importance, one of the most prevailing limitations in modern day discourse parsing is the lack of large-scale datasets. To overcome the data sparsity issue, distantly supervised approaches from tasks like sentiment analysis and summarization have been recently proposed. Here, we extend this line of research by exploiting distant supervision from topic segmentation, which can arguably provide a strong and oftentimes complementary signal for high-level discourse structures. Experiments on two human-annotated discourse treebanks confirm that our proposal generates accurate tree structures on sentence and paragraph level, consistently outperforming previous distantly supervised models on the sentence-to-document task and occasionally reaching even higher scores on the sentence-to-paragraph level.

show abstract

Evaluating Topic Quality with Posterior Variability

Xing¹,

Paul

Carenini

2019

View full text Add to dashboard Cite

Probabilistic topic models such as latent Dirichlet allocation (LDA) are popularly used with Bayesian inference methods such as Gibbs sampling to learn posterior distributions over topic model parameters. We derive a novel measure of LDA topic quality using the variability of the posterior distributions. Compared to several existing baselines for automatic topic evaluation, the proposed metric achieves state-of-the-art correlations with human judgments of topic quality in experiments on three corpora. 1 We additionally demonstrate that topic quality estimation can be further improved using a supervised estimator that combines multiple metrics.

show abstract

Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning

Xing¹,

Xiao

Carenini

2021

Preprint

View full text Add to dashboard Cite

show abstract

Diagnosing and Improving Topic Models by Analyzing Posterior Variability

Xing

Paul

2018

AAAI

View full text Add to dashboard Cite

Bayesian inference methods for probabilistic topic models can quantify uncertainty in the parameters, which has primarily been used to increase the robustness of parameter estimates. In this work, we explore other rich information that can be obtained by analyzing the posterior distributions in topic models. Experimenting with latent Dirichlet allocation on two datasets, we propose ideas incorporating information about the posterior distributions at the topic level and at the word level. At the topic level, we propose a metric called topic stability that measures the variability of the topic parameters under the posterior. We show that this metric is correlated with human judgments of topic quality as well as with the consistency of topics appearing across multiple models. At the word level, we experiment with different methods for adjusting individual word probabilities within topics based on their uncertainty. Humans prefer words ranked by our adjusted estimates nearly twice as often when compared to the traditional approach. Finally, we describe how the ideas presented in this work could potentially applied to other predictive or exploratory models in future work.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Linzi Xing

Exploring Timelines of Confirmed Suicide Incidents Through Social Media

Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition

Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning

Incorporating Metadata into Content-Based User Embeddings

Predicting Above-Sentence Discourse Structure Using Distant Supervision from Topic Segmentation

Evaluating Topic Quality with Posterior Variability

Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning

Diagnosing and Improving Topic Models by Analyzing Posterior Variability

Contact Info

Product

Resources

About