The webLyzard media monitoring and Web intelligence platform (www.webLyzard.com) presented in this paper is a flexible tool for assessing the positioning of an organization and the effectiveness of its communications. The platform aggregates large archives of digital content from multiple stakeholders. Each week it processes millions of documents and user comments from news media, blogs, Web 2.0 platforms such as Facebook, Twitter and YouTube, and the Web sites of companies and NGOs. An interactive dashboard with trend charts and complex map projections shows how often and where information is published. It also provides a real-time account of topics that stakeholders associate with an organization. Positive or negative sentiment is computed automatically, which reflects the impact of public relations and marketing campaigns.
Knowledge capture approaches in the age of massive Web data require robust and scalable mechanisms to acquire, consolidate and pre-process large amounts of heterogeneous data, both unstructured and structured. This paper addresses this requirement by introducing the Extensible Web Retrieval Toolkit (eWRT), a modular Python API for retrieving social data from Web sources such as Delicious, Flickr, Yahoo! and Wikipedia. eWRT has been released as an open source library under GNU GPLv3. It includes classes for caching and data management, and provides low-level text processing capabilities including language detection, phonetic string similarity measures, and string normalization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.