Steven Van Canneyt scite author profile

To create your highlights, please type the highlights against each \item command.It should be short collection of bullet points that convey the core findings of the article. It should include 3 to 5 bullet points (maximum 85 characters, including spaces, per bullet point.)• We create text representations by weighing word embeddings using idf information.• A novel median-based loss is designed to mitigate the negative e↵ect of outliers.• A dataset of semantically related textual pairs from Wikipedia and Twitter is made.• Our method outperforms all word embedding baselines in a semantic similarity task.• Our method is out-of-the-box and thus requires no retraining in di↵erent contexts.ABSTRACT Short text messages such as tweets are very noisy and sparse in their use of vocabulary. Traditional textual representations, such as tf-idf, have difficulty grasping the semantic meaning of such texts, which is important in applications such as event detection, opinion mining, news recommendation, etc. We constructed a method based on semantic word embeddings and frequency information to arrive at low-dimensional representations for short texts designed to capture semantic similarity. For this purpose we designed a weight-based model and a learning procedure based on a novel median-based loss function. This paper discusses the details of our model and the optimization methods, together with the experimental results on both Wikipedia and Twitter data. We find that our method outperforms the baseline approaches in the experiments, and that it generalizes well on different word embeddings without retraining. Our method is therefore capable of retaining most of the semantic information in the text, and is applicable out-of-the-box.

show abstract

Learning Semantic Similarity for Very Short Texts

Boom

Canneyt

Bohez

et al. 2015

View full text Add to dashboard Cite

Levering data on social media, such as Twitter and Facebook, requires information retrieval algorithms to become able to relate very short text fragments to each other. Traditional text similarity methods such as tf-idf cosine-similarity, based on word overlap, mostly fail to produce good results in this case, since word overlap is little or non-existent. Recently, distributed word representations, or word embeddings, have been shown to successfully allow words to match on the semantic level. In order to pair short text fragments - as a concatenation of separate words - an adequate distributed sentence representation is needed, in existing literature often obtained by naively combining the individual word representations. We therefore investigated several text representations as a combination of word embeddings in the context of semantic pair matching. This paper investigates the effectiveness of several such naive techniques, as well as traditional tf-idf similarity, for fragments of different lengths. Our main contribution is a first step towards a hybrid method that combines the strength of dense distributed representations - as opposed to sparse term matching - with the strength of tf-idf based methods to automatically reduce the impact of less informative terms. Our new approach outperforms the existing techniques in a toy experimental set-up, leading to the conclusion that the combination of word embeddings and tf-idf information might lead to a better model for semantic content within very short text fragments.Comment: 6 pages, 5 figures, 3 tables, ReLSD workshop at ICDM 1

show abstract

Detecting Places of Interest Using Social Media

Canneyt

Schockaert

Laere

et al. 2012

View full text Add to dashboard Cite

Abstract-Place recommender systems are increasingly being used to find places of a given type that are close to a user-specified location. As it is important for these systems to use an up-to-date database with a wide coverage, there is a need for techniques that are capable of expanding place databases in an automated way. On the other hand, social media are a rich source of geographically distributed information. In this paper, we therefore propose an approach to discover new instances of a given place type by exploiting correlations between terms and locations in geotagged social media. For a variety of place types, our approach is able to find places which are not yet included in popular place databases such as Foursquare or Google Places.

show abstract

Describing Patterns and Disruptions in Large Scale Mobile App Usage Data

Canneyt

Bron

Haines

et al. 2017

View full text Add to dashboard Cite

Topic-Dependent Sentiment Classification on Twitter

Canneyt

Claeys

Dhoedt

2015

View full text Add to dashboard Cite

Discovering and Characterizing Places of Interest Using Flickr and Twitter

Canneyt

Schockaert

Dhoedt

View full text Add to dashboard Cite

Databases of places have become increasingly popular to identify places of a given type that are close to a user-specified location. As it is important for these systems to use an up-to-date database with a broad coverage, there is a need for techniques that are capable of expanding place databases in an automated way. In this paper the authors discuss how geographically annotated information obtained from social media can be used to discover new places. In particular, the authors first determine potential places of interest by clustering the locations where Flickr photos have been taken. The tags from the Flickr photos and the terms of the Twitter messages posted in the vicinity of the obtained candidate places of interest are then used to rank them based on the likelihood that they belong to a given type. For several place types, their methodology finds places that are not yet contained in the databases used by Foursquare, Google, LinkedGeoData and Geonames. Furthermore, the authors' experimental results show that the proposed method can successfully identify errors in existing place databases such as Foursquare.

show abstract

Optimizing the Popularity of Twitter Messages through User Categories

Lemahieu

Canneyt

Boom

et al. 2015

View full text Add to dashboard Cite

Abstract-In this paper, we investigate how the category of a Twitter user can be used to better predict and optimize the popularity of tweets. The contributions of this paper are threefold. First, we compare the influence of content features on the popularity of tweets for different user categories. Second, we present a regression model to predict the popularity of tweets given the content features as input. To construct this model, we interpolate a generic regression model, which is trained on all data, and a category-specific model, which is only trained on tweets from users of the same category as the user of the given tweet. In this way we can combine the advantage of the robustness of a generic model, with the ability of categoryspecific models to pick up on category-specific influence of content features. The third contribution is the investigation of the feasibility of boosting the popularity of a tweet by setting up an experiment in which we proactively adapt content features in order to optimize the popularity of tweets. Based on this research, we conclude that the introduction of user categories leads to a more precise analysis and better predictions. In the hands-on experiment, we observed a gain in popularity by proactively adapting content features.

show abstract

Categorizing events using spatio-temporal and user features from Flickr

Canneyt

Schockaert

Dhoedt

2016

Information Sciences

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.