Causal discovery becomes especially challenging when the possibility of latent confounding and/or selection bias is not assumed away. For this task, ancestral graph models are particularly useful in that they can represent the presence of latent confounding and selection effect, without explicitly invoking unobserved variables. Based on the machinery of ancestral graphs, there is a provably sound causal discovery algorithm, known as the FCI algorithm, that allows the possibility of latent confounders and selection bias. However, the orientation rules used in the algorithm are not complete. In this paper, we provide additional orientation rules, augmented by which the FCI algorithm is shown to be complete, in the sense that it can, under standard assumptions, discover all aspects of the causal structure that are uniquely determined by facts of probabilistic dependence and independence. The result is useful for developing any causal discovery and reasoning system based on ancestral graph models
No abstract
Many algorithms proposed in the machine learning community for inferring causality from data are grounded on two assumptions, known as the Causal Markov Condition and the Causal Faithfulness Condition. Philosophical discussions of the latter condition have focused on how often and in what domains we can expect it to hold or fail. This paper instead investigates to what extent the faithfulness can be tested. The investigation yields a theoretical and a practical result: a strictly weaker Faithfulness condition which is nonetheless sufficient to justify some reliable methods of causal inference, and a way to make some causal inference procedures more robust. The latter, we argue, is related to the possibility of controlling the probability of large errors with finite sample size ("uniform consistency") in causal inference.
It is commonplace to encounter nonstationary or heterogeneous data, of which the underlying generating process changes over time or across data sets (the data sets may have different experimental conditions or data collection conditions). Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper we develop a principled framework for causal discovery from such data, called Constraint-based causal Discovery from Nonstationary/heterogeneous Data (CD-NOD), which addresses two important questions. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a way to determine causal orientations by making use of independence changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods.
In several organizations, it has become increasingly popular to document and log the steps that makeup a typical business process. In some situations, a normative workflow model of such processes is developed, and it becomes important to know if such a model is actually being followed by analyzing the available activity logs. In other scenarios, no model is available and, with the purpose of evaluating cases or creating new production policies, one is interested in learning a workflow representation of such activities. In either case, machine learning tools that can mine workflow models are of great interest and still relatively unexplored. We present here a probabilistic workflow model and a corresponding learning algorithm that runs in polynomial time. We illustrate the algorithm on example data derived from a real world workflow. Categories and Subject Descriptors General Terms Algorithms KeywordsWorkflow mining, graphical models, causal models MOTIVATIONMost large social organizations are complex systems. Every day they perform various types of processes, such as assembling a car, designing and implementing software, organizing a conference, and so on. A process is a set of tasks to be * This work was carried out while on internship at Clairvoyance Corporation Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. accomplished, where every task might have pre-requisites within the process that have to be fulfilled before execution.For instance, implementing a database query system should not be performed before the necessary data structures are designed. One should not add the doors to a car before the seats are in place. That is, some tasks are essentially sequential. But it is fair to say that building the speakers of a car bears no implication on the manufacturing of the tires, and vice-versa, i.e., some tasks can be executed in parallel. Moreover, there are tasks that are mutually exclusive: for instance, one has to decide if a given share of coffee harvest is to be exported, or sent to the internal market. Some tasks might also be executed in cycles.To analyze productivity, identify outliers, cut unnecessary expenses, and design other production policies, models of work are important, i.e., abstract representations of typical process instances modeling the causal and probabilistic dependencies among tasks. Such models are based on the concepts of sequential, parallel, iterative (cyclic) and mutually exclusive tasks and are used to evaluate costs, monitor processes, and predict the effect of new policies [7]. For these reasons, empirically building process models from data is of great interest. Such a problem has been called process mining, or si...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.