A systematic and reproducible “workflow”—the process that moves a scientific investigation from raw data to coherent research question to insightful contribution—should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work.
Small datasets comprising observations made under conditions of repeatability or of reproducibility pervade the practice of measurement science. Many laboratories typically will make only one determination, occasionally they will make two, and only rarely will they make three or more replicate determinations of the same measurand. Interlaboratory comparisons, including key comparisons, and meta-analyses, often involve only a handful of participants. These limitations pose considerable challenges to the production of reliable uncertainty evaluations. This contribution, intended for metrologists, describes techniques that may be employed to address this challenge either when the only information in hand is what those few observations provide, or when there also is preexisting knowledge about the measurement procedure or about the measurand. Although the technical details vary, the key message is persistently the same: that there is no universal solution to the challenges raised by small datasets, and that if a measurand is worth measuring, then the observations deserve a customized treatment responsive to the peculiarities of the case, and a level of effort sufficient to render the final result fit for its intended purpose. The focus is on the measurement of scalar measurands, similarly to the Guide to the Expression of Uncertainty in Measurement (GUM), but the range of measurement models considered is much wider than the GUM entertains. We review the advantages of the Hodges–Lehmann estimator, as a general purpose replacement for the arithmetic average, in all cases where the replicated observations are approximately symmetrically distributed around a central, typical value. We illustrate the application of empirical Bayes methods to uncertainty evaluations, in particular in the context of data reductions of small data sets. Metrologists who are skeptical about the use of subjective prior distributions may derive some value from this novel application, and thereby develop an appreciation for how Bayesian procedures can help address the challenges posed by small datasets. The estimates of the measurand that different approaches produce often agree, at least approximately, but the corresponding uncertainty quantifications may differ markedly. In one example, involving three observations, a Bayesian approach yields a coverage interval appreciably narrower than the GUM’s approach. In another example, involving only two observations, an approach involving far less restrictive assumptions than those made in the GUM, produces a confidence interval that is almost as narrow as the conventional interval.
Significance Conservation outreach has long depended on an intuitive sense of which species are more “charismatic” or engaging, for example, placing focus on certain charismatic megafauna in advertising materials. Online community science databases like eBird and iNaturalist provide records of how people engage with different birds under differing data collection protocols. Comparisons between the two databases reveal biases in bird reporting rates. Larger, more colorful, and rarer birds are preferentially engaged with opportunistically in iNaturalist records compared to more systematic eBird records. These relationships and the species-specific engagement indexes determined from these data can be applied to conservation and outreach efforts to help foster a public relationship with nature and can be used to improve models using these two databases.
International (American Society for Testing and Materials, ASTM) ASTM E74-13a [1] and by the International Organization for Standardization (ISO) ISO 376:2011(E) [15]. This paper introduces several statistical methods that overcome limitations of procedures currently in use at the National Institute of Standards and Technology (NIST), which are consistent with those standards, thus increasing the reliability of calibration results and uncertainty evaluations.
The Consultative Committee for Mass and related quantities (ccm
Identifying which species are perceived as charismatic can improve the impact and efficiency of conservation outreach, as charismatic species receive more conservation funding and have their conservation needs prioritized. Sociological experiments studying animal charisma have relied on stated preferences to find correlations between hypothetical "willingness to pay" or "empathy" for a species' conservation and species' size, color, and aesthetic appeal. Recognizing the increasing availability of digital records of public engagement with animals that reveal preferences, an emerging field of "culturomics" uses Google search results, Wikipedia article activities, and other digital modes of engagement to identify charismatic species and traits. In this study, we take advantage of community science efforts as another form of digital data that can reveal observer preferences. We apply a multi-stage analysis to ask whether opportunistic birders contributing to iNaturalist engage more with larger, more colorful, and rarer birds relative to a baseline, from eBird contributors, approximating unbiased detection. We find that body mass, color contrast, and range size all predict overrepresentation in the opportunistic dataset. We also find evidence that, across 473 modeled species, 52 species are significantly overreported and 158 are significantly underreported, indicating a wide variety of species-specific effects. Understanding which birds are charismatic can aid conservationists in creating impactful outreach materials and engaging new naturalists. The quantified differences between two prominent community science efforts may also be of use for researchers leveraging the data from one or both of them to answer scientific questions of interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.