Long Length Document Classification by Local Convolutional Feature Aggregation

Liu, Liu; Liu, Kaile; Cong, Zhenghai; Zhao, Jiali; Ji, Yefei; He, Jun

doi:10.3390/a11080109

Cited by 22 publications

(14 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Character-level CNN are explored in [16] but it is prohibitive for very long documents. In [17], dataset collected from arXiv papers is used for classification. For classification, they sample random blocks of words and use them together for classification instead of using full document which may work well as arXiv papers are usually coherent and well written on a well defined topic.…”

Section: Related Workmentioning

confidence: 99%

Hierarchical Transformers for Long Document Classification

Pappagari

Żelasko

Villalba

et al. 2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

152

View full text Add to dashboard Cite

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm. We extend its fine-tuning procedure to address one of its major limitations -applicability to inputs longer than a few hundred words, such as transcripts of human call conversations. Our method is conceptually simple. We segment the input into smaller chunks and feed each of them into the base model. Then, we propagate each output through a single recurrent layer, or another transformer, followed by a softmax activation. We obtain the final classification decision after the last segment has been consumed. We show that both BERT extensions are quick to fine-tune and converge after as little as 1 epoch of training on a small, domain-specific data set. We successfully apply them in three different tasks involving customer call satisfaction prediction and topic classification, and obtain a significant improvement over the baseline models in two of them.

show abstract

Section: Related Workmentioning

confidence: 99%

Hierarchical Transformers for Long Document Classification

Pappagari

Żelasko

Villalba

et al. 2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

152

View full text Add to dashboard Cite

show abstract

“…A social media user can be modeled as collection of their posts, so we look at neural models for large-scale text classification. Liu et al (2018) split a document into chunks and use a combination of CNNs and RNNs for document classification. While this approach proves to be successful for scientific paper categorization, it is unintuitive to use in social media text due to an unclear way of splitting user's data into equally sized chunks of text.…”

Section: Related Workmentioning

confidence: 99%

Adapting Deep Learning Methods for Mental Health Prediction on Social Media

Sekulić¹,

Strube²

2019

Proceedings of the 5th Workshop on Noisy User-Generated Text (W-Nut 2019)

View full text Add to dashboard Cite

Mental health poses a significant challenge for an individual's well-being. Text analysis of rich resources, like social media, can contribute to deeper understanding of illnesses and provide means for their early detection. We tackle a challenge of detecting social media users' mental status through deep learningbased models, moving away from traditional approaches to the task. In a binary classification task on predicting if a user suffers from one of nine different disorders, a hierarchical attention network outperforms previously set benchmarks for four of the disorders. Furthermore, we explore the limitations of our model and analyze phrases relevant for classification by inspecting the model's word-level attention weights.

show abstract

“…In our study, we expect to exploit the recent breakthroughs in deep learning, in particular attention learning, for long document classification. In our recent work [31], three local convolutional feature aggregation methods were proposed to deal with the long document classification task by subsampling parts of the original document. One of the aggregation methods is based on recurrent hard attention though, it only relies on the local convolutional features without considering the important context information of a long document and it also uses the recurrent structure as the encoder.…”

Section: ) Deep Learning Based Approaches For Text Classificationmentioning

confidence: 99%

“…IMPLEMENTATION DETAILS 1) arXiv DATA SET arXiv is a web site that collects preprints of papers in physics, mathematics, computer science and biology. In our previous work, we collected 4 classes of arXiv data set with total of 12195 papers including cs.IT, cs.Ne, math.AC, and math.GR [31]. In this paper, we expanded the 4 classes of data set to 11 classes with total of 33388 papers.…”

Section: Performance Evaluationmentioning

confidence: 99%

Long Document Classification From Local Word Glimpses via Recurrent Attention Learning

Wang

Liu

et al. 2019

IEEE Access

Self Cite

View full text Add to dashboard Cite

Document classification requires to extract high-level features from low-level word vectors. Typically, feature extraction by deep neural networks makes use of all words in a document, which cannot scale well for a long document. In this paper, we propose to tackle the long document classification task by incorporating the recurrent attention learning framework, which can produce the discriminative features with significantly less words. Specifically, the core work is to train a recurrent neural network (RNN)-based controller, which can focus its attention on the discriminative parts. Then, the glimpsed feature is extracted by a typical short text level convolutional neural network (CNN) from the focused group of words. The controller locates its attention according to the context information, which consists of the coarse representation of the original document and the memorized glimpsed features. By glimpsing a few groups, the document can be classified by aggregating these glimpsed features and the coarse representation. For our collected 11-class 10 000-word arXiv paper data set, the proposed method outperforms two subsampled deep CNN baseline models by a large margin given much less observed words.

show abstract

Long Length Document Classification by Local Convolutional Feature Aggregation

Cited by 22 publications

References 14 publications

Hierarchical Transformers for Long Document Classification

Hierarchical Transformers for Long Document Classification

Adapting Deep Learning Methods for Mental Health Prediction on Social Media

Long Document Classification From Local Word Glimpses via Recurrent Attention Learning

Contact Info

Product

Resources

About