Debora Donato scite author profile

The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content in sites based on user contributions-social media sitesbecomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from members of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans.

show abstract

Preferential attachment in the growth of social networks: The internet encyclopedia Wikipedia

Capocci

et al. 2006

View full text Add to dashboard Cite

We present an analysis of the statistical properties and growth of the free on-line encyclopedia Wikipedia. By describing topics by vertices and hyperlinks between them as edges, we can represent this encyclopedia as a directed graph. The topological properties of this graph are in close analogy with those of the World Wide Web, despite the very different growth mechanism. In particular, we measure a scale-invariant distribution of the in and out degree and we are able to reproduce these features by means of a simple statistical model. As a major consequence, Wikipedia growth can be described by local rules such as the preferential attachment mechanism, though users, who are responsible of its evolution, can act globally on the network.

show abstract

A reference collection for web spam

Castillo

Donato

Becchetti³

et al. 2006

SIGIR Forum

163

130

View full text Add to dashboard Cite

show abstract

Large scale properties of the Webgraph

Donato

Laura

Leonardi

et al. 2004

The European Physical Journal B - Condensed Matter

115

View full text Add to dashboard Cite

Query suggestions using query-flow graphs

et al. 2009

View full text Add to dashboard Cite

The query-flow graph [Boldi et al., CIKM 2008] is an aggregated representation of the latent querying behavior contained in a query log. Intuitively, in the query-flow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same search mission. Any path over the query-flow graph may be seen as a possible search task, whose likelihood is given by the strength of the edges along the path. An edge (qi, qj) is also labelled with some information: e.g., the probability that user moves from qi to qj, or the type of the transition, for instance, the fact that qj is a specialization of qi.In this paper we propose, and experimentally study, query recommendations based on short random walks on the queryflow graph. Our experiments show that these methods can match in precision, and often improve, recommendations based on query-click graphs, without using users' clicks. Our experiments also show that it is important to consider transition-type labels on edges for having good quality recommendations.Finally, one feature that we had in mind while devising our methods was that of providing diverse sets of recommendations: the experimentation that we conducted provides encouraging results in this sense.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Debora Donato

Finding high-quality content in social media

Preferential attachment in the growth of social networks: The internet encyclopedia Wikipedia

A reference collection for web spam

Large scale properties of the Webgraph

Query suggestions using query-flow graphs

Contact Info

Product

Resources

About