2020
DOI: 10.1017/pan.2020.1
|View full text |Cite
|
Sign up to set email alerts
|

Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality

Abstract: Matching for causal inference is a well-studied problem, but standard methods fail when the units to match are text documents: the high-dimensional and rich nature of the data renders exact matching infeasible, causes propensity scores to produce incomparable matches, and makes assessing match quality difficult. In this paper, we characterize a framework for matching text documents that decomposes existing methods into: (1) the choice of text representation, and (2) the choice of distance metric. We investigat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
60
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 46 publications
(68 citation statements)
references
References 65 publications
(94 reference statements)
0
60
0
Order By: Relevance
“…We see several opportunities for pushing forward the text-based confounding literature. We hope that scholars will extend our work to a proposed alternative to TIRM, a task already started by Mozer et al (2020) and Veitch, Sridhar, and Blei (2019). A central challenge is developing general-purpose methods for evaluating these new models.…”
Section: Resultsmentioning
confidence: 95%
See 2 more Smart Citations
“…We see several opportunities for pushing forward the text-based confounding literature. We hope that scholars will extend our work to a proposed alternative to TIRM, a task already started by Mozer et al (2020) and Veitch, Sridhar, and Blei (2019). A central challenge is developing general-purpose methods for evaluating these new models.…”
Section: Resultsmentioning
confidence: 95%
“…Our approach of augmenting propensity scores with information about topic balance is most similar to the covariate-balancing propensity scores of Imai and Ratkovic (2014). Mozer et al (2020) and Veitch, Sridhar, and Blei (2019) directly build on our framework to propose alternative text adjustment approaches, and the related literature is reviewed in Keith, Jensen, and O'Connor (2020).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Our method performs substantially better than methods limited to one-to-one correspondence and without using grouping information. Our methodological framework is particularly appealing because it can be extended to a wide range of applications, including confounding adjustment via text matching using text data in social science (Roberts et al 2018, Mozer et al 2018, and cross-language record linkage (Song et al 2016, McNamee et al 2011). The learned mapping matrix Π and translation matrix W have key practical value in transferring statistical models across systems (Torrey & Shavlik 2010), capturing the pose of objects (Zhou et al 2014), estimating the relative angle of proteins (Sael & Kihara 2010) and so on.…”
Section: Discussionmentioning
confidence: 99%
“…The second will take advantage of stochastic variational inference (Hoffman et al, 2013) to enable Bayesian Word Embeddings to scale to massive corpora. Finally, the third track for future word will involve tying the anchoring approach discussed above with the emerging literature on making casual claims from text (Fong and Grimmer, 2016;Mozer et al, 2018), and taking advantage of the word similarities to identify appropriate linguistic counterfactuals. Figure 4: There is no significant difference between foreign and international topics before and after 1945, uncertainty is displayed with 95% confidence intervals.…”
Section: Materials Conflict Events Increase In Response To Bellicositymentioning
confidence: 99%