The digitization of biocollections is a critical task with direct implications for the global community who use the data for research and education. Recent innovations to involve citizen scientists in digitization increase awareness of the value of biodiversity specimens; advance science, technology, engineering, and math literacy; and build sustainability for digitization. In support of these activities, we launched the first global citizen-science event focused on the digitization of biodiversity specimens: Worldwide Engagement for Digitizing Biocollections (WeDigBio). During the inaugural 2015 event, 21 sites hosted events where citizen scientists transcribed specimen labels via online platforms (DigiVol, Les Herbonautes, Notes from Nature, the Smithsonian Institution's Transcription Center, and Symbiota). Many citizen scientists also contributed off-site. In total, thousands of citizen scientists around the world completed over 50,000 transcription tasks. Here, we present the process of organizing an international citizen-science event, an analysis of the event's effectiveness, and future directions—content now foundational to the growing WeDigBio event.
This article explores the socially constructed space of Wikipedia and how the process and structure of Wikipedia enable it to act both as a vehicle for communication between sport fans and to subtly augment existing public narratives about sport. As users create article narratives, they educate fellow fans in relevant social and sport meanings. This study analyzes two aspects of Wikipedia for sports fans, application of statistical information and connecting athletes with other sports figures and organizations, through a discourse analysis of article content and the discussion pages of ten sample athletes. These pages of retired celebrity athletes provide a means for exploring the multidirectional production processes used by the sport fan community to celebrate recorded events of sporting history in clearly delineated and verifiable ways, thus maintaining the sport fans’ community social values.
Chronicling America is a product of the National Digital Newspaper Program, a partnership between the Library of Congress and the National Endowment for the Humanities to digitize historic American newspapers. Over 16 million pages have been digitized to date, complete with high-resolution images and machine-readable METS/ALTO OCR. Of considerable interest to Chronicling America users is a semantified corpus, complete with extracted visual content and headlines. To accomplish this, we introduce a visual content recognition model trained on bounding box annotations collected as part of the Library of Congress's Beyond Words crowdsourcing initiative and augmented with additional annotations including those of headlines and advertisements. We describe our pipeline that utilizes this deep learning model to extract 7 classes of visual content: headlines, photographs, illustrations, maps, comics, editorial cartoons, and advertisements, complete with textual content such as captions derived from the METS/ALTO OCR, as well as image embeddings. We report the results of running the pipeline on 16.3 million pages from the Chronicling America corpus and describe the resulting Newspaper Navigator dataset, the largest dataset of extracted visual content from historic newspapers ever produced. The Newspaper Navigator dataset, finetuned visual content recognition model, and all source code are placed in the public domain for unrestricted re-use.
Crowdsourced transcription has grown in popularity as a tool for generating transcribed data and public engagement. This method of making digitized materials available on online platforms designed for volunteers to transcribe content works particularly well with science and historical materials. A well-designed site can offer volunteers a chance to interact with collections while providing the cultural institution with a new access point for researchers in the form of searchable text; a well-designed program of engagement can support sustained activity and unexpected positive outcomes. Many questions remain about how best to engage the public and the quality of resulting transcription. Many institutions design their sites to provide carefully structured experiences for volunteers. These projects organize materials around a research goal or subject and often provide detailed templates for the transcription. By fashioning a highly structured experience, are we fully engaging volunteers to interact with the materials? What happens if an institution creates an online environment that allows volunteers more choices and control? Would this affect the online community and transcription output? And what would be the impact of a structured engagement?
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.