Proceedings of the 2017 ACM on Conference on Information and Knowledge Management 2017
DOI: 10.1145/3132847.3133104
|View full text |Cite
|
Sign up to set email alerts
|

Language Modeling by Clustering with Word Embeddings for Text Readability Assessment

Abstract: We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences. We argue that clustering with word embeddings in the metric space should yield feature representations in a higher semantic space appropriate for text regression. Also, by representing features in terms of histograms, our approach can naturally address documents of varying le… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
14
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(15 citation statements)
references
References 14 publications
1
14
0
Order By: Relevance
“…Here, the colors represent the author of the text. We observe that clustering doc2vec embeddings has been used extensively in language analysis (see, e.g., [8]). (ii) victorian 5 .…”
Section: Datasetsmentioning
confidence: 93%
“…Here, the colors represent the author of the text. We observe that clustering doc2vec embeddings has been used extensively in language analysis (see, e.g., [8]). (ii) victorian 5 .…”
Section: Datasetsmentioning
confidence: 93%
“…The classification model used is the regularized neural network with one hidden layer. • SG-KM-SVM is a word embedding-based readability assessment method proposed by Cha et al (2017). In SG-KM-SVM, the representation of a document is generated by applying average pooling on the word embedding and cluster membership of all words in the document.…”
Section: Comparisons To the State-of-the-art Methodsmentioning
confidence: 99%
“…Then, inspired by some works of Natural Language Process (NLP), we first use K-Means clustering algorithm to classify the words into a specific group according to pretrained word embeddings (Cha, Gwon, and Kung 2017). For example, verb and noun will be classified into two different groups.…”
Section: Methodsmentioning
confidence: 99%