Insight into the growth (or shrinkage) of “knowledge communities” of authors that build on each other's work can be gained by studying the evolution over time of clusters of documents. We cluster documents based on the documents they cite in common using the Streemer clustering method, which finds cohesive foreground clusters (the knowledge communities) embedded in a diffuse background. We build predictive models with features based on the citation structure, the vocabulary of the papers, and the affiliations and prestige of the authors and use these models to study the drivers of community growth and the predictors of how widely a paper will be cited. We find that scientific knowledge communities tend to grow more rapidly if their publications build on diverse information and use narrow vocabulary and that papers that lie on the periphery of a community have the highest impact, while those not in any community have the lowest impact.
Microblogging as introduced by Twitter is becoming a source of tracking real-time news. Although identifying the highest quality or most useful posts or tweets from Twitter for breaking news is still an open problem, major web search engines seem convinced of the value of such posts and have already started allocating part of their search results pages to them. In this paper, we study a different aspect of the problem for a search engine: instead of the value of the posts, we study the value of the (shortened) URLs referenced in these posts. Our results indicate that unlike frequently bookmarked URLs, which are generally of high quality, frequently tweeted URLs tend to fall in two opposite categories: they are either high in quality, or they are spam. Identifying the quality category of a URL is not trivial, but the combination of characteristics can reveal some trends.
The online communities available on the Web have shown to be significantly interactive and capable of collectively solving difficult tasks. Nevertheless, it is still a challenge to decide how a task should be dispatched through the network due to the high diversity of the communities and the dynamically changing expertise and social availability of their members. We introduce CrowdSTAR, a framework designed to route tasks across and within online crowds. CrowdSTAR indexes the topic-specific expertise and social features of the crowd contributors and then uses a routing algorithm, which suggests the best sources to ask based on the knowledge vs. availability trade-offs. We experimented with the proposed framework for question and answering scenarios by using two popular social networks as crowd candidates: Twitter and Quora.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.