Social media data represent an important resource for behavioral analysis of the ageing population. This paper addresses the problem of age prediction from Twitter dataset, where the prediction issue is viewed as a classification task. For this purpose, an innovative model based on Convolutional Neural Network is devised. To this end, we rely on languagerelated features and social media specific metadata. More specifically, we introduce two features that have not been previously considered in the literature: the content of URLs and hashtags appearing in tweets. We also employ distributed representations of words and phrases present in tweets, hashtags and URLs, pre-trained on appropriate corpora in order to exploit their semantic information in age prediction. We show that our CNN-based classifier, when compared with an SVM baseline model, yields an improvement of 12.3% and 6.6% in the micro-averaged F1 score on the Dutch and English datasets, respectively.
Abstract. Given the huge amount of static and dynamic content created for eLearning tasks, the major challenge for extending their use is to improve the effectiveness of retrieval and accessibility by making use of Learning Management Systems. The aim of the European project Language Technology for eLearning is to tackle this problem by providing Language Technology based functionalities and by integrating semantic knowledge to facilitate the management, distribution and retrieval of the learning material.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.