2016
DOI: 10.1016/j.patrec.2016.06.012
|View full text |Cite
|
Sign up to set email alerts
|

Representation learning for very short texts using weighted word embedding aggregation

Abstract: To create your highlights, please type the highlights against each \item command.It should be short collection of bullet points that convey the core findings of the article. It should include 3 to 5 bullet points (maximum 85 characters, including spaces, per bullet point.)• We create text representations by weighing word embeddings using idf information.• A novel median-based loss is designed to mitigate the negative e↵ect of outliers.• A dataset of semantically related textual pairs from Wikipedia and Twitter… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
90
0
1

Year Published

2017
2017
2020
2020

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 150 publications
(93 citation statements)
references
References 19 publications
(20 reference statements)
2
90
0
1
Order By: Relevance
“…We use the average of the word embeddings of content words in the tweet. Average of word embeddings have been used for different NLP tasks (De Boom et al, 2016;Yoon et al, 2018;Orasan, 2018;Komatsu et al, 2015;Ettinger et al, 2018). As in past work, words that were not learned in the embeddings are dropped during the computation of the tweet vector.…”
Section: Word-based Representationsmentioning
confidence: 99%
“…We use the average of the word embeddings of content words in the tweet. Average of word embeddings have been used for different NLP tasks (De Boom et al, 2016;Yoon et al, 2018;Orasan, 2018;Komatsu et al, 2015;Ettinger et al, 2018). As in past work, words that were not learned in the embeddings are dropped during the computation of the tweet vector.…”
Section: Word-based Representationsmentioning
confidence: 99%
“…There are some advantages to this continuous space since the dimensionality is largely reduced and the words closer in meaning are close in this new continuous space. There have been introduced some applications of the word embedding based on neural networks including the word2vec [26], Dictionary of Affect in Language (DAL) [44], SentiWordNet [43], Glove [26] and Wikitionary [45].…”
Section: ) Word-embedding Descriptormentioning
confidence: 99%
“…In Twitter, word embeddings are generally used for classification tasks which focuses on sentiment classification such as [60], [61] and also other classification tasks like [62], [63]. Among the research which uses word embeddings, [64] has the most similar approach to our work, as it uses a hybrid approach using tf-idf and word embeddings. [64] is evaluated with Wikipedia and Twitter data.…”
Section: Related Workmentioning
confidence: 99%
“…Among the research which uses word embeddings, [64] has the most similar approach to our work, as it uses a hybrid approach using tf-idf and word embeddings. [64] is evaluated with Wikipedia and Twitter data. It performs well on Wikipedia, however the error rate on Twitter is very high due to insufficient number of words in each tweet necessary for tf-idf.…”
Section: Related Workmentioning
confidence: 99%