2021
DOI: 10.14778/3476311.3476402
|View full text |Cite
|
Sign up to set email alerts
|

Managing ML pipelines

Abstract: The industrial machine learning pipeline requires iterating on model features, training and deploying models, and monitoring deployed models at scale. Feature stores were developed to manage and standardize the engineer's workflow in this end-to-end pipeline, focusing on traditional tabular feature data. In recent years, however, model development has shifted towards using self-supervised pretrained embeddings as model features. Managing these embeddings and the downstream systems that use them introduces new … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(4 citation statements)
references
References 9 publications
0
1
0
Order By: Relevance
“…Selecting from available models is a standard practice in model-less inference [57], and not all models will be deployed. Besides, inference systems are not the only user of stored models [33,52]. We curate various models to show Tabi's generality, while the user-maintained repository size depends on users' deployment scale.…”
Section: Methodsmentioning
confidence: 99%
“…Selecting from available models is a standard practice in model-less inference [57], and not all models will be deployed. Besides, inference systems are not the only user of stored models [33,52]. We curate various models to show Tabi's generality, while the user-maintained repository size depends on users' deployment scale.…”
Section: Methodsmentioning
confidence: 99%
“…High parallelism is prevalent in modern applications. Apart from traditional high-performance computing workloads, large numbers of parallel tasks in ML pipelines (e.g., feature stores [20,42,53]) can amount to 1/3 𝑟𝑑 of the total energy consumption, exceeding the amount of energy used by model training of large-scale jobs [54]. To precisely attribute CPU energy for a parallel application, one needs to obtain the CPU time for each of its tasks (processes and kernel threads) per socket.…”
Section: Relevant Factors In Energy Attributionmentioning
confidence: 99%
“…Moreover, information about extractor dependencies, intrinsic properties, and impact on prediction ability have a high value for the knowledge discovery pipeline. To utilise this information, a suitable framework is needed, also referred to as a feature store [14]. The concept of a feature store is now used by many artificial intelligence (AI) businesses to support their machine learning (ML) process [15].…”
Section: Feature Engineeringmentioning
confidence: 99%
“…The described KDFE process is time consuming, though we proved that it adds quantitative value to the researcher. If results and gained experience were collected from this study a knowledge foundation based on feature store [14] could be created that would support the development of an automated KDFE process. Qualitative aspects will be central in the design of the knowledge database that automated KDFE uses.…”
Section: Can the Kdfe Process Be Automated (Q2)?mentioning
confidence: 99%