Measurement or observation error is common in ecological data: as citizen scientists and automated algorithms play larger roles processing growing volumes of data to address problems at large scales, concerns about data quality and strategies for improving it have received greater focus. However, practical guidance pertaining to fundamental data quality questions for data users or managers—how accurate do data need to be and what is the best or most efficient way to improve it?—remains limited. We present a generalizable framework for evaluating data quality and identifying remediation practices, and demonstrate the framework using trail camera images classified using crowdsourcing to determine acceptable rates of misclassification and identify optimal remediation strategies for analysis using occupancy models. We used expert validation to estimate baseline classification accuracy and simulation to determine the sensitivity of two occupancy estimators (standard and false‐positive extensions) to different empirical misclassification rates. We used regression techniques to identify important predictors of misclassification and prioritize remediation strategies. More than 93% of images were accurately classified, but simulation results suggested that most species were not identified accurately enough to permit distribution estimation at our predefined threshold for accuracy (<5% absolute bias). A model developed to screen incorrect classifications predicted misclassified images with >97% accuracy: enough to meet our accuracy threshold. Occupancy models that accounted for false‐positive error provided even more accurate inference even at high rates of misclassification (30%). As simulation suggested occupancy models were less sensitive to additional false‐negative error, screening models or fitting occupancy models accounting for false‐positive error emerged as efficient data remediation solutions. Combining simulation‐based sensitivity analysis with empirical estimation of baseline error and its variability allows users and managers of potentially error‐prone data to identify and fix problematic data more efficiently. It may be particularly helpful for “big data” efforts dependent upon citizen scientists or automated classification algorithms with many downstream users, but given the ubiquity of observation or measurement error, even conventional studies may benefit from focusing more attention upon data quality.
We provide program managers insight into considerations for launching and running a largescale, long-term citizen science project, using the Snapshot Wisconsin trail-camera project as a case study. Many citizen science projects are undertaken with a "learn as you go" approach, so there is room to better prepare program managers from the outset. We provide a comprehensive list of components making up citizen science projects, and discuss capacity needs for each component. We then quantify staff time needed throughout the project, based on our own experiences managing a long-term citizen science project with >1,000 participants. We show that total staff time and staff time devoted to certain project components vary markedly among 3 project phases: planning, growth, and maintenance. We recommend planning for 5.5 staff positions to maintain a long-term project serving a few hundred volunteers or more. The illustrated concepts can be applied by any person or group developing a volunteer-based project to prepare for logistic and funding needs across a project's lifespan. Program managers must remember that people form the backbone of any citizen science project, and the success or failure of such projects depend in large part on the user experience of volunteers. Ó 2019 The Wildlife Society.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.