Examining Temporality in Document Classification

Huang, Xiaolei; Paul, Michael J.

doi:10.18653/v1/p18-2110

Cited by 30 publications

(41 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We retrieved available data sources from previous publications (Zhang et al, 2014;He and McAuley, 2016;Huang and Paul, 2018). Specifically, we use four different sources in in English-Amazon (music reviews), Yelp (restaurant and hotel reviews), Twitter, and economic newspaper articles ( Figure Eight Inc., 2015)-and one source in Chinese, Dianping (Meituan-Dianping, 2019).…”

Section: Datamentioning

confidence: 99%

“…Following Huang and Paul (2018), we group the corpora into several bins of temporal intervals; specifically, non-repeating time intervals spanning one or more years (Table 1). We encode each temporal domain into the discrete time labels, 1, 2, ...T .…”

Section: Datamentioning

confidence: 99%

“…Document classification models often use feature representations that are derived from words. Therefore, variations in word usage across time will change the distribution of features over time, which can impact the stability of document classifiers (Huang and Paul, 2018). Our goal in this section is to test whether there are temporal variations in our datasets, how strong the effects are,…”

Section: Analysis 1: Word Usage Shiftmentioning

confidence: 99%

“…In this work, we propose an alternative approach to encoding time into word embeddings. The idea is inspired by the "easy" domain adaptation method (Daume III, 2007), which was shown to be successful at modeling different temporal do-mains (Huang and Paul, 2018), and can be implemented by simply modifying the input data without modifying the training process. In our approach, words in the training data are concatenated with the name of the time interval, and embeddings are trained using a sub-word sharing framework (Bojanowski et al, 2016).…”

Section: Concatenative Training Approachmentioning

confidence: 99%

See 3 more Smart Citations

Neural Temporality Adaptation for Document Classification: Diachronic Word Embeddings and Domain Adaptation Models

Huang¹,

Paul²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Language usage can change across periods of time, but document classifiers models are usually trained and tested on corpora spanning multiple years without considering temporal variations. This paper describes two complementary ways to adapt classifiers to shifts across time. First, we show that diachronic word embeddings, which were originally developed to study language change, can also improve document classification, and we show a simple method for constructing this type of embedding. Second, we propose a time-driven neural classification model inspired by methods for domain adaptation. Experiments on six corpora show how these methods can make classifiers more robust over time.

show abstract

Section: Datamentioning

confidence: 99%

Section: Datamentioning

confidence: 99%

Section: Analysis 1: Word Usage Shiftmentioning

confidence: 99%

Section: Concatenative Training Approachmentioning

confidence: 99%

See 2 more Smart Citations

Neural Temporality Adaptation for Document Classification: Diachronic Word Embeddings and Domain Adaptation Models

Huang¹,

Paul²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Consistency regularization methods (e.g., self-ensembling) outperform adversarial methods on visual semi-supervised and domain adaptation tasks (Athiwaratkun et al, 2019), but have rarely been applied to textual data (Ko et al, 2019). Finally, Huang and Paul (2018) establish the feasibility of using domain adaptation to label documents from discrete time periods. Our work departs from previous work by proposing an adaptive, time-aware approach to consistency regularization provisioned with causal convolutional networks.…”

Section: Related Workmentioning

confidence: 99%

Adaptive Ensembling: Unsupervised Domain Adaptation for Political Document Analysis

Desai

Sinno²,

Rosenfeld³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Insightful findings in political science often require researchers to analyze documents of a certain subject or type, yet these documents are usually contained in large corpora that do not distinguish between pertinent and nonpertinent documents. In contrast, we can find corpora that label relevant documents but have limitations (e.g., from a single source or era), preventing their use for political science research. To bridge this gap, we present adaptive ensembling, an unsupervised domain adaptation framework, equipped with a novel text classification model and time-aware training to ensure our methods work well with diachronic corpora. Experiments on an expert-annotated dataset show that our framework outperforms strong benchmarks. Further analysis indicates that our methods are more stable, learn better representations, and extract cleaner corpora for fine-grained analysis.

show abstract

A Comparative Analysis of Temporal Long Text Similarity: Application to Financial Documents

Rawte

Gupta

Zaki

2021

Mining Data for Financial Applications

View full text Add to dashboard Cite

Examining Temporality in Document Classification

Cited by 30 publications

References 11 publications

Neural Temporality Adaptation for Document Classification: Diachronic Word Embeddings and Domain Adaptation Models

Neural Temporality Adaptation for Document Classification: Diachronic Word Embeddings and Domain Adaptation Models

Adaptive Ensembling: Unsupervised Domain Adaptation for Political Document Analysis

A Comparative Analysis of Temporal Long Text Similarity: Application to Financial Documents

Contact Info

Product

Resources

About