2017
DOI: 10.1093/biomet/asx023
|View full text |Cite
|
Sign up to set email alerts
|

Data integration with high dimensionality

Abstract: SummaryWe consider situations where the data consist of a number of responses for each individual, which may include a mix of discrete and continuous variables. The data also include a class of predictors, where the same predictor may have different physical measurements across different experiments depending on how the predictor is measured. The goal is to select which predictors affect any of the responses, where the number of such informative predictors tends to infinity as the sample size increases. There … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
53
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
7

Relationship

2
5

Authors

Journals

citations
Cited by 32 publications
(53 citation statements)
references
References 38 publications
(97 reference statements)
0
53
0
Order By: Relevance
“…There is a large literature on variable-selection methods for prediction, but little work on variable selection for data integration that can successfully recognize the strengths and the limitations of each source of data and utilize all information captured for finite population inference. Gao and Carroll (2017) proposed a pseudolikelihood approach to combining multiple non-survey data with high dimensionality; this approach requires that all likelihoods are correctly specified and therefore is sensitive to model misspecification. Chen, Valliant and Elliott (2018) proposed a model-based calibration approach using lasso regression; this approach relies on a correctly specified outcome model.…”
Section: Introductionmentioning
confidence: 99%
“…There is a large literature on variable-selection methods for prediction, but little work on variable selection for data integration that can successfully recognize the strengths and the limitations of each source of data and utilize all information captured for finite population inference. Gao and Carroll (2017) proposed a pseudolikelihood approach to combining multiple non-survey data with high dimensionality; this approach requires that all likelihoods are correctly specified and therefore is sensitive to model misspecification. Chen, Valliant and Elliott (2018) proposed a model-based calibration approach using lasso regression; this approach relies on a correctly specified outcome model.…”
Section: Introductionmentioning
confidence: 99%
“…Since BAR aims to approximate ℓ 0 ‐penalized regression, it directly provides a surrogate optima to some popular information criteria with some prefixed λ n . For example, performing BAR with λn=clogfalse(pnfalse) for some c >0 leads to a surrogate optima for the directly optimizing the extended BIC . For thoroughness, in addition to using a 25‐value grid for c , we also include simulation results in Table for BAR with some prefixed values λn=0.5logfalse(pnfalse) and λn=logfalse(pnfalse).…”
Section: Simulationsmentioning
confidence: 99%
“…For example, performing BAR with n = c log(p n ) for some c > 0 leads to a surrogate optima for the directly optimizing the extended BIC. [44][45][46] For thoroughness, in addition to using a 25-value grid for c, we also include simulation results in Table 1 for BAR with some prefixed values n = 0.5 log(p n ) and n = log(p n ). Not surprisingly, BAR with these prefixed values produced sometimes slightly suboptimal, but generally comparable estimation and selection performance.…”
Section: Model Selection and Parameter Estimationmentioning
confidence: 99%
“…For the tumour samples in the CGA there are two experiments with survival outcome and disease status as the responses whereas the predictors are the same. In this situation, because of the reasons that were indicated in Gao and Carroll (), some kind of data integration is necessary to achieve better survival prediction by integrating both of the measurements. Towards this objective, we develop a joint survival and binary model using the latent variables.…”
Section: Introductionmentioning
confidence: 99%
“…On the basis of the idea of group variable regularization, Gao and Carroll () proposed a data integration method in high dimensional settings, which accommodates multiple responses from the same set of covariates. Parameter estimation is achieved via maximizing a pseudolikelihood.…”
Section: Introductionmentioning
confidence: 99%