2018
DOI: 10.3390/a11080109
|View full text |Cite
|
Sign up to set email alerts
|

Long Length Document Classification by Local Convolutional Feature Aggregation

Abstract: Abstract:The exponential increase in online reviews and recommendations makes document classification and sentiment analysis a hot topic in academic and industrial research. Traditional deep learning based document classification methods require the use of full textual information to extract features. In this paper, in order to tackle long document, we proposed three methods that use local convolutional feature aggregation to implement document classification. The first proposed method randomly draws blocks of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 22 publications
(14 citation statements)
references
References 14 publications
0
13
0
1
Order By: Relevance
“…Character-level CNN are explored in [16] but it is prohibitive for very long documents. In [17], dataset collected from arXiv papers is used for classification. For classification, they sample random blocks of words and use them together for classification instead of using full document which may work well as arXiv papers are usually coherent and well written on a well defined topic.…”
Section: Related Workmentioning
confidence: 99%
“…Character-level CNN are explored in [16] but it is prohibitive for very long documents. In [17], dataset collected from arXiv papers is used for classification. For classification, they sample random blocks of words and use them together for classification instead of using full document which may work well as arXiv papers are usually coherent and well written on a well defined topic.…”
Section: Related Workmentioning
confidence: 99%
“…A social media user can be modeled as collection of their posts, so we look at neural models for large-scale text classification. Liu et al (2018) split a document into chunks and use a combination of CNNs and RNNs for document classification. While this approach proves to be successful for scientific paper categorization, it is unintuitive to use in social media text due to an unclear way of splitting user's data into equally sized chunks of text.…”
Section: Related Workmentioning
confidence: 99%
“…In our study, we expect to exploit the recent breakthroughs in deep learning, in particular attention learning, for long document classification. In our recent work [31], three local convolutional feature aggregation methods were proposed to deal with the long document classification task by subsampling parts of the original document. One of the aggregation methods is based on recurrent hard attention though, it only relies on the local convolutional features without considering the important context information of a long document and it also uses the recurrent structure as the encoder.…”
Section: ) Deep Learning Based Approaches For Text Classificationmentioning
confidence: 99%
“…IMPLEMENTATION DETAILS 1) arXiv DATA SET arXiv is a web site that collects preprints of papers in physics, mathematics, computer science and biology. In our previous work, we collected 4 classes of arXiv data set with total of 12195 papers including cs.IT, cs.Ne, math.AC, and math.GR [31]. In this paper, we expanded the 4 classes of data set to 11 classes with total of 33388 papers.…”
Section: Performance Evaluationmentioning
confidence: 99%