The Best of Both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning

Scherzinger, Stefanie; Seifert, Christin; Wiese, Lena

doi:10.1109/icdcs.2019.00161

Cited by 7 publications

(5 citation statements)

References 44 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the recent interest in DL methods, several works propose provenance management approaches for data analysis during DNN training [11]. There are several challenges in making ML workflows provenance aware like taking into account the execution framework that may involve CPUs, GPUs, TPUs, and distributed environments such as clusters and clouds as discussed in [36,42,14]. In this section, we discuss related work for provenance data management, considering the intention of using provenance for runtime data analysis.…”

Section: Related Workmentioning

confidence: 99%

“…The approaches in this category manage provenance for several purposes in ML platforms [46,35,24,40,2,36,30,41,12]. They are all based on a proprietary representation of provenance data, i.e., that does not follow recommendations like W3C PROV.…”

Section: Machine-and Deep Learning-specific Approachesmentioning

confidence: 99%

“…Basically, there are two approaches to make ML workflows provenance aware. The first is provenance provided for a specific ML platform [46,35,24,40,2,20,36,30] and the second is the provenance systems that are independent of the domain [32]. In the first approach, each ML platform provides provenance using its proprietary representation, which is difficult to interpret and compare with execution between different platforms.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Provenance Supporting Hyperparameter Analysis in Deep Neural Networks

Pina

Kunstmann

Oliveira

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The duration of the life cycle in deep neural networks (DNN) depends on the data configuration decisions that lead to success in obtaining models. Analyzing hyperparameters along the evolution of the network's execution allows for adapting the data. Provenance data derivation traces help the parameter fine-tuning by providing a global data picture with clear dependencies. Provenance can also contribute to the interpretation of models resulting from the DNN life cycle. However, there are challenges in collecting hyperparameters and in modeling the relationships between the data involved in the DNN life cycle to build a provenance database. Current approaches adopt different notions of provenance in their representation and require the execution of the DNN under a specific software framework, which limits interoperability and flexibility when choosing the DNN execution environment. This work presents a provenance data-based approach to address these challenges, proposing a collection mechanism with flexibility in the choice and representation of data to be analyzed. Experiments of the approach, using a convolutional neural network focused on image recognition, provide evidence of the flexibility, the efficiency of data collection, the analysis and the validation of network data.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Machine-and Deep Learning-specific Approachesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Provenance Supporting Hyperparameter Analysis in Deep Neural Networks

Pina

Kunstmann

Oliveira

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Specifically, a number of tools are available to help developers build machine learning pipelines [50,18,51] or debug them [52], but these lack the ability to explain the provenance of a certain data item in the processed dataset. Others link provenance to explainability in a distributed machine learning setting [53] but without offering specific tools. Amazon identifies that there are common and reusable components to a machine learning pipeline, but that there is no way to track the exploration of pipeline construction effectively, and calls for metadata capture to support reasoning over pipeline design [54].…”

Section: Related Workmentioning

confidence: 99%

Capturing and querying fine-grained provenance of preprocessing pipelines in data science

et al. 2020

View full text Add to dashboard Cite

Data processing pipelines that are designed to clean, transform and alter data in preparation for learning predictive models, have an impact on those models' accuracy and performance, as well on other properties, such as model fairness. It is therefore important to provide developers with the means to gain an in-depth understanding of how the pipeline steps affect the data, from the raw input to training sets ready to be used for learning. While other efforts track creation and changes of pipelines of relational operators, in this work we analyze the typical operations of data preparation within a machine learning process, and provide infrastructure for generating very granular provenance records from it, at the level of individual elements within a dataset. Our contributions include: (i) the formal definition of a core set of preprocessing operators, and the definition of provenance patterns for each of them, and (ii) a prototype implementation of an application-level provenance capture library that works alongside Python. We report on provenance processing and storage overhead and scalability experiments, carried out over both real ML benchmark pipelines and over TCP-DI, and show how the resulting provenance can be used to answer a suite of provenance benchmark queries that underpin some of the developers' debugging questions, as expressed on the Data Science Stack Exchange.

show abstract

“…To explain how a DML algorithm gives a decision, all transformations applied to the data should be considered. It is claimed in [214] that even basic transformations in data pre-processing, such as data partitioning, local data cleaning and value imputation, can have a strong impact on the resultant model. The effect becomes more apparent under a distributed setting.…”

Section: B Open Issuesmentioning

confidence: 99%

Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications

Chen

et al. 2020

Preprint

View full text Add to dashboard Cite

Distributed machine learning (DML) techniques, such as federated learning, partitioned learning, and distributed reinforcement learning, have been increasingly applied to wireless communications. This is due to improved capabilities of terminal devices, explosively growing data volume, congestion in the radio interfaces, and increasing concern of data privacy. The unique features of wireless systems, such as large scale, geographically dispersed deployment, user mobility, and massive amount of data, give rise to new challenges in the design of DML techniques. There is a clear gap in the existing literature in that the DML techniques are yet to be systematically reviewed for their applicability to wireless systems. This survey bridges the gap by providing a contemporary and comprehensive survey of DML techniques with a focus on wireless networks. Specifically, we review the latest applications of DML in power control, spectrum management, user association, and edge cloud computing. The optimality, scalability, convergence rate, computation cost, and communication overhead of DML are analyzed. We also discuss the potential adversarial attacks faced by DML applications, and describe state-of-the-art countermeasures to preserve privacy and security. Last but not least, we point out a number of key issues yet to be addressed, and collate potentially interesting and challenging topics for future research.

show abstract

The Best of Both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning

Cited by 7 publications

References 44 publications

Provenance Supporting Hyperparameter Analysis in Deep Neural Networks

Provenance Supporting Hyperparameter Analysis in Deep Neural Networks

Capturing and querying fine-grained provenance of preprocessing pipelines in data science

Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications

Contact Info

Product

Resources

About