2014
DOI: 10.2139/ssrn.2518493
|View full text |Cite
|
Sign up to set email alerts
|

Identification, Data Combination and the Risk of Disclosure

Abstract: It is commonplace that the data needed for econometric inference are not contained in a single source. In this paper we analyze the problem of parametric inference from combined individual-level data when data combination is based on personal and demographic identifiers such as name, age, or address. Our main question is the identification of the econometric model based on the combined data when the data do not contain exact individual identifiers and no parametric assumptions are imposed on the joint distribu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
2
2

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(11 citation statements)
references
References 48 publications
(14 reference statements)
0
11
0
Order By: Relevance
“…In this case, it is convenient to merge the two samples by linking the records relating to the same unit. There is a rich literature on record linkage which is beyond the scope of this paper [67,68,69,70,71,72]. In the second scenario, the two samples are selected from the same population but have no common unit.…”
Section: Statistical Matchingmentioning
confidence: 99%
“…In this case, it is convenient to merge the two samples by linking the records relating to the same unit. There is a rich literature on record linkage which is beyond the scope of this paper [67,68,69,70,71,72]. In the second scenario, the two samples are selected from the same population but have no common unit.…”
Section: Statistical Matchingmentioning
confidence: 99%
“…The assumption that the two samples are independent distinguishes our problem from the one with common observational units (e.g., Devereux and Triphati, 2009;Komarova, Nekipelov and Yakovlev, 2012;Poirer and Ziebarth, 2014). The fact that the two samples do not deliver point identification distinguishes this paper from either the case when the two samples jointly deliver point identification (see e.g., Chen, Hong and Tamer, 2005;Hirukawa and Prokhorov, 2014) or the case when one sample alone delivers point identification and a second sample is used for efficiency gains (see e.g., Hellerstein and Imbens, 1999).…”
Section: Related Literaturementioning
confidence: 99%
“…We consider the problem of estimating a linearly parametrized utility function with selection problems (in the decision to engage in the utility-producing activity as well as in the decision to reveal the utility derived) when the data is contained in two separate datasets -one «private» and one «public». To do this, we follow a setup similar to (Komarova et al, 2015(Komarova et al, , 2017. However, we make a slight adjustment to the data generating model due to the fact that each observation in our hypothetical «master dataset» is not just a single individual, but an individual-firm combination.…”
Section: Model Setupmentioning
confidence: 99%
“…gives a sequence of thresholds that impose more stringent «rare» identifier frequency and distance requirements on larger samples, and tries to isolate unique matches in the limit (identifier distance of zero). For a more thorough discussion of more general conditions that can be imposed on this sequence, the interested reader is encouraged to read (Komarova et al, 2015(Komarova et al, , 2017.…”
Section: Combining the Data -Decision Rules And K-anonymitymentioning
confidence: 99%
See 1 more Smart Citation