Sara Stoudt scite author profile

A systematic and reproducible “workflow”—the process that moves a scientific investigation from raw data to coherent research question to insightful contribution—should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work.

show abstract

Uncertainty evaluations from small datasets

Stoudt

Pintar

Possolo

2021

Metrologia

View full text Add to dashboard Cite

Small datasets comprising observations made under conditions of repeatability or of reproducibility pervade the practice of measurement science. Many laboratories typically will make only one determination, occasionally they will make two, and only rarely will they make three or more replicate determinations of the same measurand. Interlaboratory comparisons, including key comparisons, and meta-analyses, often involve only a handful of participants. These limitations pose considerable challenges to the production of reliable uncertainty evaluations. This contribution, intended for metrologists, describes techniques that may be employed to address this challenge either when the only information in hand is what those few observations provide, or when there also is preexisting knowledge about the measurement procedure or about the measurand. Although the technical details vary, the key message is persistently the same: that there is no universal solution to the challenges raised by small datasets, and that if a measurand is worth measuring, then the observations deserve a customized treatment responsive to the peculiarities of the case, and a level of effort sufficient to render the final result fit for its intended purpose. The focus is on the measurement of scalar measurands, similarly to the Guide to the Expression of Uncertainty in Measurement (GUM), but the range of measurement models considered is much wider than the GUM entertains. We review the advantages of the Hodges–Lehmann estimator, as a general purpose replacement for the arithmetic average, in all cases where the replicated observations are approximately symmetrically distributed around a central, typical value. We illustrate the application of empirical Bayes methods to uncertainty evaluations, in particular in the context of data reductions of small data sets. Metrologists who are skeptical about the use of subjective prior distributions may derive some value from this novel application, and thereby develop an appreciation for how Bayesian procedures can help address the challenges posed by small datasets. The estimates of the measurand that different approaches produce often agree, at least approximately, but the corresponding uncertainty quantifications may differ markedly. In one example, involving three observations, a Bayesian approach yields a coverage interval appreciably narrower than the GUM’s approach. In another example, involving only two observations, an approach involving far less restrictive assumptions than those made in the GUM, produces a confidence interval that is almost as narrow as the conventional interval.

show abstract

Nonparametric Identifiability in Species Distribution and Abundance Models: Why it Matters and How to Diagnose a Lack of it Using Simulation

Stoudt

Valpine

Fithian

2023

J Stat Theory Pract

View full text Add to dashboard Cite

Identifying engaging bird species and traits with community science observations

Stoudt

Goldstein

Valpine

2022

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

Significance Conservation outreach has long depended on an intuitive sense of which species are more “charismatic” or engaging, for example, placing focus on certain charismatic megafauna in advertising materials. Online community science databases like eBird and iNaturalist provide records of how people engage with different birds under differing data collection protocols. Comparisons between the two databases reveal biases in bird reporting rates. Larger, more colorful, and rarer birds are preferentially engaged with opportunistically in iNaturalist records compared to more systematic eBird records. These relationships and the species-specific engagement indexes determined from these data can be applied to conservation and outreach efforts to help foster a public relationship with nature and can be used to improve models using these two databases.

show abstract

Force calibration using errors-in-variables regression and Monte Carlo uncertainty evaluation

2016

View full text Add to dashboard Cite

International (American Society for Testing and Materials, ASTM) ASTM E74-13a [1] and by the International Organization for Standardization (ISO) ISO 376:2011(E) [15]. This paper introduces several statistical methods that overcome limitations of procedures currently in use at the National Institute of Standards and Technology (NIST), which are consistent with those standards, thus increasing the reliability of calibration results and uncertainty evaluations.

show abstract

Evaluation of the accuracy, consistency, and stability of measurements of the Planck constant used in the redefinition of the international system of units

et al. 2017

View full text Add to dashboard Cite

show abstract

Identifying Charismatic Bird Species and Traits with Community Science Observations

Stoudt

Goldstein

Valpine

2021

Preprint

View full text Add to dashboard Cite

Identifying which species are perceived as charismatic can improve the impact and efficiency of conservation outreach, as charismatic species receive more conservation funding and have their conservation needs prioritized. Sociological experiments studying animal charisma have relied on stated preferences to find correlations between hypothetical "willingness to pay" or "empathy" for a species' conservation and species' size, color, and aesthetic appeal. Recognizing the increasing availability of digital records of public engagement with animals that reveal preferences, an emerging field of "culturomics" uses Google search results, Wikipedia article activities, and other digital modes of engagement to identify charismatic species and traits. In this study, we take advantage of community science efforts as another form of digital data that can reveal observer preferences. We apply a multi-stage analysis to ask whether opportunistic birders contributing to iNaturalist engage more with larger, more colorful, and rarer birds relative to a baseline, from eBird contributors, approximating unbiased detection. We find that body mass, color contrast, and range size all predict overrepresentation in the opportunistic dataset. We also find evidence that, across 473 modeled species, 52 species are significantly overreported and 158 are significantly underreported, indicating a wide variety of species-specific effects. Understanding which birds are charismatic can aid conservationists in creating impactful outreach materials and engaging new naturalists. The quantified differences between two prominent community science efforts may also be of use for researchers leveraging the data from one or both of them to answer scientific questions of interest.

show abstract

Toward Reproducible and Extensible Research: from Values to Action

Goeva¹,

Stoudt²,

Trisovic³

2020

View full text Add to dashboard Cite

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sara Stoudt

Principles for data analysis workflows

Uncertainty evaluations from small datasets

Nonparametric Identifiability in Species Distribution and Abundance Models: Why it Matters and How to Diagnose a Lack of it Using Simulation

Identifying engaging bird species and traits with community science observations

Force calibration using errors-in-variables regression and Monte Carlo uncertainty evaluation

Evaluation of the accuracy, consistency, and stability of measurements of the Planck constant used in the redefinition of the international system of units

Identifying Charismatic Bird Species and Traits with Community Science Observations

Toward Reproducible and Extensible Research: from Values to Action

Contact Info

Product

Resources

About