2017
DOI: 10.1080/07474938.2016.1222068
|View full text |Cite
|
Sign up to set email alerts
|

Two-sample least squares projection

Abstract: This paper investigates the problem of making inference about the coefficients in the linear projection of an outcome variable y on covariates (x, z) when data are available from two independent random samples; the first sample contains information on only the variables (y, z), while the second sample contains information on only the covariates. In this context, the validity of existing inference procedures depends crucially on the assumptions imposed on the joint distribution of (y, z, x). This paper introduc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 26 publications
0
7
0
Order By: Relevance
“…Another example, somewhat related to meta-analysis for prediction model evaluation (Riley et al 2010, Debray et al 2013, 2017, might involve enriching a data set of a clinical study with covariate information from a separate source, say a study containing socio-demographic or summary-level information, but no outcome data, for the purpose of improving clinical risk prediction (Chen & Chen 2000, Chatterjee et al 2016. Clearly, in both of these examples, a regression model for the outcome on the combined set of covariates can be identified only under fairly stringent parametric assumptions and, as we discuss below, provided that there is a non-trivial overlap in the amount of information available from both sources of data.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Another example, somewhat related to meta-analysis for prediction model evaluation (Riley et al 2010, Debray et al 2013, 2017, might involve enriching a data set of a clinical study with covariate information from a separate source, say a study containing socio-demographic or summary-level information, but no outcome data, for the purpose of improving clinical risk prediction (Chen & Chen 2000, Chatterjee et al 2016. Clearly, in both of these examples, a regression model for the outcome on the combined set of covariates can be identified only under fairly stringent parametric assumptions and, as we discuss below, provided that there is a non-trivial overlap in the amount of information available from both sources of data.…”
Section: Introductionmentioning
confidence: 99%
“…Graham et al (2016) identified the two-sample IV estimation problem as one specific example of a larger, general class of data combination or fusion problems, and derived semiparametric efficiency bounds under the corresponding general class of moment conditions which allow sample moments of the common variables V to differ significantly across the two datasets being combined. Pacini (2017) assumes independence of the samples and makes use of the marginal distributions to provide a characterization of the identified set of the coefficients of interest when no assumption on the joint distribution of (Y, V, L) is imposed. Robins et al (1995) consider a missing data setting closely related to ours.…”
Section: Introductionmentioning
confidence: 99%
“…Our paper is also related to Pacini (2019), which constructs bounds on the best linear predictor of Y on X in a similar data combination framework as here. We show that if one is ready to impose the usual assumption that the model is partially linear, large identification gains may be achieved, possibly up to point identification.…”
Section: Related Literaturementioning
confidence: 99%
“…Selection exchangeability and positivity were assumed similar to Section 4.1.1, while no assumption on separable moments [64,60] or conditional independence [2] introduced in previous sections was made. In this setting, identification of θ can be hard even under linear models, which has been discussed in [76], [3], and [77]. [61] proposed a doubly robust estimator for θ that solves n i=1 U(S i , Y i , X i , Z i ; θ)/n = 0 where…”
Section: Statistical Matchingmentioning
confidence: 99%