Online discussions became increasingly widespread with the Web 2.0: no matter the distance, whether you know the person or not, you can discuss and exchange ideas with people all over the world through forums, blogs, and newsgroups. The news websites have extensively used forums in order to encourage the reader being a real participant in the information media. This paper aims at automatically extracting the celebrities from such discussions. We propose certain meta-criteria and we provide an evaluation on a dataset of 35,175 posts written by 14,443 users. The results show that one of the proposed meta-criteria succeeds in extracting celebrities and allows for further improvements.
I. INTRODUCTIONThe Web 2.0 has introduced an incredibly simple way of interaction: regardless the distance or whether you know the person or not, you can discuss with people from all over the world through the Web 2.0 applications (blogs, e-mails, dedicated media, etc.). In particular, forum debates on news websites are a very representative case of interaction between people using the Web 2.0.In this kind of interaction, as in real life, people play a social role [1]. Goffman [2] defines the social role as "the enactment of rights and duties attached to a given status". In other words, people have some regularities in their behaviour and this behaviour can be analysed in order to figure out the social role. Golder and Donath [3] study the Usenet newsgroup and they define the celebrity social role in order to represent people who are recognised in and by their community.As a result, in this paper we provide three major contributions: the theoretical formalisation of the 'celebrity' social role inspired by and based on previous anthropological studies; the experiments of this theoretical framework on data extracted from a news website using three different meta-criteria and a baseline-criterion; and a discussion about the social role extraction, and the relation with the kind of data we deal with.The paper continues as follows. At first, we discuss the related work concerning the topic of social role recognition in online media. Then, we present the theoretical framework based on the study of Golder and Donath [3] extended with three meta-criteria to take into account the special features of the forum debates on news websites. We continue by presenting the experimental framework and the dataset we used, and we discuss about the evaluation of the different criteria applied. We conclude by commenting on the results and discussing the outcome.