Demystifying a Dark Art: Understanding Real-World Machine Learning Model Development

Lee, Angela; Xin, Doris; Lee, Doris; Parameswaran, Aditya

doi:10.48550/arxiv.2005.01520

Cited by 3 publications

(4 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hill et al [29] interview ML application developers and report challenges related to building first versions of ML models, especially around the early stages of exploration and experimentation (e.g., feature engineering, model training). They describe the process of building models as "magic"-similarly echoed by Lee et al [50] when analyzing ML projects from Github-with unique practices of debugging data in addition to code. Serban et al [88] conduct a survey of practitioners and list 29 software engineering practices for ML, such as "Use Continuous Integration" and "Peer Review Training Scripts."…”

Section: Software Engineering For MLmentioning

confidence: 99%

See 1 more Smart Citation

"We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning

Shankar,

Garcia,

Hellerstein

et al. 2024

Proc. ACM Hum.-Comput. Interact.

View full text Add to dashboard Cite

Organizations rely on machine learning engineers (MLEs) to deploy models and maintain ML pipelines in production. Due to models' extensive reliance on fresh data, the operationalization of machine learning, or MLOps, requires MLEs to have proficiency in data science and engineering. When considered holistically, the job seems staggering---how do MLEs do MLOps, and what are their unaddressed challenges? To address these questions, we conducted semi-structured ethnographic interviews with 18 MLEs working on various applications, including chatbots, autonomous vehicles, and finance. We find that MLEs engage in a workflow of (i) data preparation, (ii) experimentation, (iii) evaluation throughout a multi-staged deployment, and (iv) continual monitoring and response. Throughout this workflow, MLEs collaborate extensively with data scientists, product stakeholders, and one another, supplementing routine verbal exchanges with communication tools ranging from Slack to organization-wide ticketing and reporting systems. We introduce the 3Vs of MLOps: velocity, visibility, and versioning --- three virtues of successful ML deployments that MLEs learn to balance and grow as they mature. Finally, we discuss design implications and opportunities for future work.

show abstract

Section: Software Engineering For MLmentioning

confidence: 99%

“…Prior work has extensively documented the Agile tendencies of MLEs, describing how they iterate quickly (i.e. with velocity) to explore a large ML or data science search space [3,30,50,74,110]. Amershi et al [3] describe how experimentation can be sped up when labels are annotated faster (i.e., rapid data preparation).…”

Section: 11mentioning

confidence: 99%

"We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning

Shankar,

Garcia,

Hellerstein

et al. 2024

Proc. ACM Hum.-Comput. Interact.

View full text Add to dashboard Cite

show abstract

“…There is a wide range of research and tooling being developed to support these many tasks. ML lifecycle management is especially challenging because it involves many cycles of trial-and-error [30], and its dependencies are hard to scope [53]. When something goes wrong, ML engineers may need to rollback their model to an earlier version [41,64], inspect old versions of the training data [24,27,37], or audit the code that was used for training [40,51].…”

Section: Related Workmentioning

confidence: 99%

Hindsight Logging for Model Training

Garcia,

Liu,

Sreekanti

et al. 2020

Preprint

View full text Add to dashboard Cite

Due to the long time-lapse between the triggering and detection of a bug in the machine learning lifecycle, model developers favor data-centric logfile analysis over traditional interactive debugging techniques. But when useful execution data is missing from the logs after training, developers have little recourse beyond re-executing training with more logging statements, or guessing. In this paper, we present hindsight logging, a novel technique for efficiently querying ad-hoc execution data, long after model training.The goal of hindsight logging is to enable analysis of past executions as if the logs had been exhaustive. Rather than materialize logs up front, we draw on the idea of physiological database recovery, and adapt it to arbitrary programs. Developers can query the state in past runs of a program by adding arbitrary log statements to their code; a combination of physical and logical recovery is used to quickly produce the output of the new log statements. We implement these ideas in Flor, a record-replay system for hindsight logging in Python. We evaluate Flor's performance on eight different model training workloads from current computer vision and NLP benchmarks. We find that Flor replay achieves near-ideal scaleout and order-of-magnitude speedups in replay, with just 1.47% average runtime overhead from record.

show abstract

“…We document these challenges in Section 1. These challenges are a culmination of (1) our conversations with practitioners while working in an industry research lab, part of a larger company with more than three hundred e-commerce subsidiaries, (2) prior research, particularly those reporting from interview studies on data analysis workflows [20,38], and (3) our experience in developing and evaluating interactive data systems. As such, the list of challenges here is intended to be a useful guide informing research and development on text analytics systems, not a comprehensive enumeration, and inevitably reflects our personal taste.…”

Section: Introductionmentioning

confidence: 99%

Leam: An Interactive System for In-situ Visual Text Analysis

Rahman,

Griggs,

Demiralp

2020

Preprint

View full text Add to dashboard Cite

With the increase in scale and availability of digital text generated on the web, enterprises such as online retailers and aggregators often use text analytics to mine and analyze the data to improve their services and products alike. Text data analysis is an iterative, non-linear process with diverse workflows spanning multiple stages, from data cleaning to visualization. Existing text analytics systems usually accommodate a subset of these stages and often fail to address challenges related to data heterogeneity, provenance, workflow reusability and reproducibility, and compatibility with established practices. Based on a set of design considerations we derive from these challenges, we propose Leam, a system that treats the text analysis process as a single continuum by combining advantages of computational notebooks, spreadsheets, and visualization tools. Leam features an interactive user interface for running text analysis workflows, a new data model for managing multiple atomic and composite data types, and an expressive algebra that captures diverse sets of operations representing various stages of text analysis and enables coordination among different components of the system, including data, code, and visualizations. We report our current progress in Leam development while demonstrating its usefulness with usage examples. Finally, we outline a number of enhancements to Leam and identify several research directions for developing an interactive visual text analysis system.

show abstract

Demystifying a Dark Art: Understanding Real-World Machine Learning Model Development

Cited by 3 publications

References 14 publications

"We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning

"We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning

Hindsight Logging for Model Training

Leam: An Interactive System for In-situ Visual Text Analysis

Contact Info

Product

Resources

About