2020
DOI: 10.48550/arxiv.2005.01520
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Demystifying a Dark Art: Understanding Real-World Machine Learning Model Development

Angela Lee,
Doris Xin,
Doris Lee
et al.

Abstract: It is well-known that the process of developing machine learning (ML) workflows is a dark-art; even experts struggle to find an optimal workflow leading to a high accuracy model. Users currently rely on empirical trial-and-error to obtain their own set of battle-tested guidelines to inform their modeling decisions. In this study, we aim to demystify this dark art by understanding how people iterate on ML workflows in practice. We analyze over 475k user-generated workflows on OpenML, an open-source platform for… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…Hill et al [29] interview ML application developers and report challenges related to building first versions of ML models, especially around the early stages of exploration and experimentation (e.g., feature engineering, model training). They describe the process of building models as "magic"-similarly echoed by Lee et al [50] when analyzing ML projects from Github-with unique practices of debugging data in addition to code. Serban et al [88] conduct a survey of practitioners and list 29 software engineering practices for ML, such as "Use Continuous Integration" and "Peer Review Training Scripts."…”
Section: Software Engineering For MLmentioning
confidence: 99%
See 1 more Smart Citation
“…Hill et al [29] interview ML application developers and report challenges related to building first versions of ML models, especially around the early stages of exploration and experimentation (e.g., feature engineering, model training). They describe the process of building models as "magic"-similarly echoed by Lee et al [50] when analyzing ML projects from Github-with unique practices of debugging data in addition to code. Serban et al [88] conduct a survey of practitioners and list 29 software engineering practices for ML, such as "Use Continuous Integration" and "Peer Review Training Scripts."…”
Section: Software Engineering For MLmentioning
confidence: 99%
“…Prior work has extensively documented the Agile tendencies of MLEs, describing how they iterate quickly (i.e. with velocity) to explore a large ML or data science search space [3,30,50,74,110]. Amershi et al [3] describe how experimentation can be sped up when labels are annotated faster (i.e., rapid data preparation).…”
Section: 11mentioning
confidence: 99%
“…There is a wide range of research and tooling being developed to support these many tasks. ML lifecycle management is especially challenging because it involves many cycles of trial-and-error [30], and its dependencies are hard to scope [53]. When something goes wrong, ML engineers may need to rollback their model to an earlier version [41,64], inspect old versions of the training data [24,27,37], or audit the code that was used for training [40,51].…”
Section: Related Workmentioning
confidence: 99%
“…We document these challenges in Section 1. These challenges are a culmination of (1) our conversations with practitioners while working in an industry research lab, part of a larger company with more than three hundred e-commerce subsidiaries, (2) prior research, particularly those reporting from interview studies on data analysis workflows [20,38], and (3) our experience in developing and evaluating interactive data systems. As such, the list of challenges here is intended to be a useful guide informing research and development on text analytics systems, not a comprehensive enumeration, and inevitably reflects our personal taste.…”
Section: Introductionmentioning
confidence: 99%