Wikipedia's low barriers to participation have the unintended effect of attracting a large number of articles whose topics do not meet Wikipedia's inclusion standards. Many are quickly deleted, often causing their creators to stop contributing to the site. We collect and make available several datasets of deleted articles, heretofore inaccessible, and use them to create a model that can predict with high precision whether or not an article will be deleted. We report precision of 98.6% and recall of 97.5% in the best case and high precision with lower, but still useful, recall, in the most difficult case. We propose to deploy a system utilizing this model on Wikipedia as a set of decision-support tools to help article creators evaluate and improve their articles before posting, and new article patrollers make more informed decisions about which articles to delete and which to improve.
Pinterest is a Social Network Site (SNS) centered around the curation and sharing of visual content. The site encourages users to form ties with (follow) other users based on mutual interests, and use these ties to discover and share content. In this work, we examine the efficacy and relevance of the Pinterest follow mechanism in driving content discovery and curation. We collect a sample of user activity and find that the vast majority (88%) of the unique users who interact with an average user's content are non-followers. Conversely, only 12.3% of a user's followers interact with any of their pins. Users who discover and repost content from outside their follow network also do not subsequently follow the contributors of that content. Our results strongly suggest that following is neither heavily utilized nor strongly effective for driving content discovery and sharing on Pinterest.
With the adoption of timestamps and geotags on Web data, search engines are increasingly being asked questions of "where" and "when" in addition to the classic "what." In the case of Twitter, many tweets are tagged with location information as well as timestamps, creating a demand for query processors that can search both of these dimensions along with text. We propose 3W, a search framework for geotemporal stamped documents. It exploits the structure of time-stamped data to dramatically shrink the temporal search space and uses a shallow tree based on the spatial distribution of tweets to allow speedy search over the spatial and text dimensions. Our evaluation on 30 million tweets shows that the prototype system outperforms the baseline approach that uses a monolithic index.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.