2022
DOI: 10.48550/arxiv.2206.15475
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Causal Machine Learning: A Survey and Open Problems

Abstract: Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This allows one to reason about the effects of changes to this process (i.e., interventions) and what would have happened in hindsight (i.e., counterfactuals). We categorize work in CausalML into five groups according to the problems they tackle: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 21 publications
(28 citation statements)
references
References 171 publications
0
19
0
Order By: Relevance
“…The inner mechanism of the labeling rule G → Y usually depends on the causal features, which are particular subparts of the entire data (Arjovsky et al, 2019;Kaddour et al, 2022;Wu et al, 2022b;Ye et al, 2022;Lu et al, 2021). While their complementary parts, environmental features, are noncausal for predicting the graphs.…”
Section: Definitions and Problem Formationsmentioning
confidence: 99%
See 1 more Smart Citation
“…The inner mechanism of the labeling rule G → Y usually depends on the causal features, which are particular subparts of the entire data (Arjovsky et al, 2019;Kaddour et al, 2022;Wu et al, 2022b;Ye et al, 2022;Lu et al, 2021). While their complementary parts, environmental features, are noncausal for predicting the graphs.…”
Section: Definitions and Problem Formationsmentioning
confidence: 99%
“…Hence, the distribution of augmented data tends not to overlap with the original distribution, which encourages the diversity of environmental features. However, from a data generation perspective, causal features are invariant and shared across environments (Kaddour et al, 2022), so they are essential features for OOD generalization. Since Principle 1 does not expose any constraint on the invariant property of the augmented distribution, we here propose the second principle for augmentation:…”
Section: Two Principles For Graph Augmentationmentioning
confidence: 99%
“…Another line of work has shown that training data re-weighting can speed up training. For example, some re-weighting methods focus on proxy models [2,29], importance sampling [3,19] or removing spurious correlations [17,36].…”
Section: Related Workmentioning
confidence: 99%
“…In many domains, including cell biology (Sachs et al, 2005), finance (Sanford & Moosa, 2012), and genetics (Zhang et al, 2013), the data generating process is thought to be represented by an underlying directed acylic graph (DAG). Many models rely on DAG assumptions, e.g., causal modeling uses DAGs to model distribution shifts, ensure predictor fairness among subpopulations, or learn agents more sample-efficiently (Kaddour et al, 2022). A key question, with implications ranging from better modeling to causal discovery, is how to recover this unknown DAG from observed data alone.…”
Section: Introductionmentioning
confidence: 99%