Understanding Collections of Related Datasets Using Dependent MMD Coresets

Williamson, Sinead A.; Henderson, Jette

doi:10.3390/info12100392

Cited by 2 publications

(2 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Second, the technical work of data science projects is typically approached as the 'one-off application' of a statistical model to a given dataset (Polyzotis et al 2017). Built on an assumption of a 'largely stable world' (Marcus 2018), data science often views changes to data (or its underlying distribution), called 'data drift', as detrimental to model performance (Hohman et al 2020;Hoens et al 2012;Williamson and Henderson 2021). For data science activities to be sustainable, data needs to be managed to account for changes to data in the dynamic world (e.g., Amershi et al 2019b;Bopp et al 2017).…”

Section: Sustaining Data Science Activities By Domain Experts As the ...mentioning

confidence: 99%

Towards Actionable Data Science: Domain Experts as End-Users of Data Science Systems

Jung

Steinberger

2023

Comput Supported Coop Work

View full text Add to dashboard Cite

Research on data science has largely viewed data as an abstract input in service of algorithms developed by data scientists. In this view, data science activities are made sustainable by the efficient flow of data to improve the algorithms. Recent studies in CSCW and HCI, however, point to how the effectiveness of algorithms critically depends on sustainably collecting reliable, complete data situated in domain experts' practices and settings. Drawing on ethnographic fieldwork and a pilot machine learning project at a craft brewery, we describe three types of situations where brewers' data practices led to unreliable, incomplete data, and how such data practices limited the effectiveness of data science activities. We analyze sources of misalignment between their data practices and data science activities, which we use to offer design implications for sustainability. Extending research on end-user software development that views sustainability as driven by domain experts as 'owners of problems,' our study proposes data science research driven by domain experts as 'owners of data.'

show abstract

Section: Sustaining Data Science Activities By Domain Experts As the ...mentioning

confidence: 99%

Towards Actionable Data Science: Domain Experts as End-Users of Data Science Systems

Jung

Steinberger

2023

Comput Supported Coop Work

View full text Add to dashboard Cite

show abstract

Section: Sustaining Data Science Activities By Domain Experts As the ...mentioning

confidence: 99%

Domain experts as owners of data: towards sustainable data science

Jung

Steinberger

2022

Preprint

View full text Add to dashboard Cite

Research on data science has largely viewed data as an abstract input in service of algorithms developed by data scientists. In this view, data science activities are made sustainable by the efficient flow of data to improve the algorithms. Recent studies in CSCW and HCI, however, point to how the effectiveness of algorithms critically depends on sustainably collecting reliable, complete data situated in domain experts’ practices and settings. Drawing on ethnographic fieldwork and a pilot machine learning project at a craft brewery, we describe three types of situations where brewers’ data practices led to unreliable, incomplete data, and how such data practices limited the effectiveness of data science activities. We analyze sources of misalignment between their data practices and data science activities, which we use to offer design implications for sustainability. Extending research on end-user software development that views sustainability as driven by domain experts as ‘owners of problems,’ our study proposes data science research driven by domain experts as ‘owners of data.’

show abstract

Understanding Collections of Related Datasets Using Dependent MMD Coresets

Cited by 2 publications

References 28 publications

Towards Actionable Data Science: Domain Experts as End-Users of Data Science Systems

Towards Actionable Data Science: Domain Experts as End-Users of Data Science Systems

Domain experts as owners of data: towards sustainable data science

Contact Info

Product

Resources

About