Debugging Machine Learning Pipelines

Lourenço, Raoni; Freire, Juliana; Shasha, Dennis

doi:10.1145/3329486.3329489

Cited by 23 publications

(23 citation statements)

References 14 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In what follows, we give a brief overview of our debugging methodology. For a more detailed discussion, see [14,15]. splits it into training and test subsets, creates and executes an estimator, and computes the F-measure score using 10-fold cross-validation.…”

Section: Bugdocmentioning

confidence: 99%

“…In previous work [14], we proposed and implemented new methods to debug machine learning pipelines that automatically and iteratively identify one or more minimal causes of failures, thereby avoiding the tedious and error-prone task of manually tuning and executing new pipeline instances to test and derive new hypotheses for the failures. We have extended this initial work and built BugDoc [15], a system that identifies root causes for errors in general computational pipelines (or workflows).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

BugDoc: A System for Debugging Computational Pipelines

Lourenço

Freire

Shasha

2020

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Self Cite

View full text Add to dashboard Cite

Data analysis for scientific experiments and enterprises, large-scale simulations, and machine learning tasks all entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous outputs, the pipeline may fail to execute or produce incorrect results. Inferring the root cause(s) of such failures is challenging, usually requiring time and much human thought, while still being error-prone. We recently proposed a new approach that makes provenance to automatically and iteratively infer root causes and derive succinct explanations of failures; such an approach was implemented in our prototype, BugDoc. In this demonstration, we will illustrate BugDoc's capabilities to debug pipelines using few configuration instances.

show abstract

Section: Bugdocmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

BugDoc: A System for Debugging Computational Pipelines

Lourenço

Freire

Shasha

2020

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Self Cite

View full text Add to dashboard Cite

show abstract

“…The first, called Shortcut, discovers definitive root causes (which we sometimes abbreviate to, simply, bugs) consisting of a single conjunction of parameter-value (formally, parameter-equality-value) pairs. The second, called Debugging Decision Trees and introduced in [36], discovers more complex definitive root causes involving inequalities (e.g., A takes a value between 5 and 13).…”

Section: Debugging Algorithmsmentioning

confidence: 99%

“…While the Shortcut and Stacked Shortcut algorithms can find a single minimal definitive root cause very efficiently, usually without truncation (as we will see in the experimental section), characterizing all minimal definitive root causes is challenging. For this purpose, we use an algorithm that is exponential (in the number of parameters) in the worst case, but can characterize inequalities as well as equalities and does well heuristically even with a small budget [36].…”

Section: Finding Bugs With Inequalities: Debugging Decision Treesmentioning

confidence: 99%

BugDoc: Algorithms to Debug Computational Processes

Lourenço

Freire

Shasha

2020

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Self Cite

View full text Add to dashboard Cite

Data analysis for scientific experiments and enterprises, large-scale simulations, and machine learning tasks all entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous outputs, the pipeline may fail to execute or produce incorrect results. Inferring the root cause(s) of such failures is challenging, usually requiring time and much human thought, while still being error-prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our experimental data and processing software is available for use, reproducibility, and enhancement.

show abstract

“…Basically, there are two approaches to make ML workflows provenance aware. The first is provenance provided for a specific ML platform [46,35,24,40,2,20,36,30] and the second is the provenance systems that are independent of the domain [32]. In the first approach, each ML platform provides provenance using its proprietary representation, which is difficult to interpret and compare with execution between different platforms.…”

Section: Introductionmentioning

confidence: 99%

Provenance Supporting Hyperparameter Analysis in Deep Neural Networks

Pina

Kunstmann

Oliveira

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The duration of the life cycle in deep neural networks (DNN) depends on the data configuration decisions that lead to success in obtaining models. Analyzing hyperparameters along the evolution of the network's execution allows for adapting the data. Provenance data derivation traces help the parameter fine-tuning by providing a global data picture with clear dependencies. Provenance can also contribute to the interpretation of models resulting from the DNN life cycle. However, there are challenges in collecting hyperparameters and in modeling the relationships between the data involved in the DNN life cycle to build a provenance database. Current approaches adopt different notions of provenance in their representation and require the execution of the DNN under a specific software framework, which limits interoperability and flexibility when choosing the DNN execution environment. This work presents a provenance data-based approach to address these challenges, proposing a collection mechanism with flexibility in the choice and representation of data to be analyzed. Experiments of the approach, using a convolutional neural network focused on image recognition, provide evidence of the flexibility, the efficiency of data collection, the analysis and the validation of network data.

show abstract

Debugging Machine Learning Pipelines

Cited by 23 publications

References 14 publications

BugDoc: A System for Debugging Computational Pipelines

BugDoc: A System for Debugging Computational Pipelines

BugDoc: Algorithms to Debug Computational Processes

Provenance Supporting Hyperparameter Analysis in Deep Neural Networks

Contact Info

Product

Resources

About