Data Integration by Combining Big Data and Survey Sample Data for Finite Population Inference

Kim, Jae Kwang; Tam, S. M.

doi:10.1111/insr.12434

Cited by 30 publications

(37 citation statements)

References 32 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These data sets contain the layers of raster and geospatial data. Kim and Tam (2020) have proposed a data integration estimator [17]. This is a classification technique with non-parametric and overlapping units which recognizes and corrects misclassification errors.…”

Section: Review Of Agriculture Sectormentioning

confidence: 99%

Big Data Integration Solutions in Organizations: A Domain-Specific Analysis

Karanam¹,

Kamath²,

Kulkarni³

et al. 2021

Data Integrity and Quality

View full text Add to dashboard Cite

Big Data Integration (BDI) process integrates the big data arising from many diverse data sources, data formats presents a unified, valuable, customized, holistic view of data. BDI process is essential to build confidence, facilitate high-quality insights and trends for intelligent decision making in organizations. Integration of big data is a very complex process with many challenges. The data sources for BDI are traditional data warehouses, social networks, Internet of Things (IoT) and online transactions. BDI solutions are deployed on Master Data Management (MDM) systems to support collecting, aggregating and delivering reliable information across the organization. This chapter has conducted an exhaustive review of BDI literature and classified BDI applications based on their domain. The methods, applications, advantages and disadvantage of the research in each paper are tabulated. Taxonomy of concepts, table of acronyms and the organization of the chapter are presented. The number of papers reviewed industry-wise is depicted as a pie chart. A comparative analysis of curated survey papers with specific parameters to discover the research gaps were also tabulated. The research issues, implementation challenges and future trends are highlighted. A case study of BDI solutions implemented in various organizations was also discussed. This chapter concludes with a holistic view of BDI concepts and solutions implemented in organizations.

show abstract

Section: Review Of Agriculture Sectormentioning

confidence: 99%

Big Data Integration Solutions in Organizations: A Domain-Specific Analysis

Karanam¹,

Kamath²,

Kulkarni³

et al. 2021

Data Integrity and Quality

View full text Add to dashboard Cite

show abstract

“…Since the sizes N B and N C = N − N B are known, a more efficient post-stratified estimator is given by b Kim and Tam (2018) showed that with a simple random sample A, the poststratified estimator achieves a large reduction in the design variance compared to the design unbiased estimator b Y ¼ ∑ i∈A d i y i based only on the probability sample A. In particular, if the sampling fraction f = n/N is small and the population variance σ 2 y and the variance of the population units not belonging to B, denoted…”

Section: Study Variable Observed In Both Samplesmentioning

confidence: 99%

“…where N B , N C and Y B are known. The resulting calibration estimator ∑ i ∈ A w i y i is identical to the post-stratified estimator b Y P (Kim and Tam 2018). However, the main advantage of the calibration approach is that it permits the inclusion of other calibration constraints, if available.…”

Section: Between the Design Weights D I And The Calibration Weights W I Subject To Calibration Constraintsmentioning

confidence: 99%

“…The case of measurement errors only in the probability sample A is more complex and requires a measurement error model. Kim and Tam (2018) also studied calibration estimation in the case of unit nonresponse in the probability sample A, assuming a general response model allowing the probability of response to depend on the study variable. Kim and Tam (2018) gave an interesting application of the above setup in official statistics.…”

Section: Between the Design Weights D I And The Calibration Weights W I Subject To Calibration Constraintsmentioning

confidence: 99%

“…Kim and Tam (2018) also studied calibration estimation in the case of unit nonresponse in the probability sample A, assuming a general response model allowing the probability of response to depend on the study variable. Kim and Tam (2018) gave an interesting application of the above setup in official statistics. In this application, the non-probability sample (or big data) is the Australian Agricultural Census with 85% response rate and the probability sample is the Rural Environment and Agricultural Commodities Survey.…”

Section: Between the Design Weights D I And The Calibration Weights W I Subject To Calibration Constraintsmentioning

confidence: 99%

See 2 more Smart Citations

On Making Valid Inferences by Integrating Data from Surveys and Other Sources

Rao

2020

Sankhya B

View full text Add to dashboard Cite

Survey samplers have long been using probability samples from one or more sources in conjunction with census and administrative data to make valid and efficient inferences on finite population parameters. This topic has received a lot of attention more recently in the context of data from non-probability samples such as transaction data, web surveys and social media data. In this paper, I will provide a brief overview of probability sampling methods first and then discuss some recent methods, based on models for the non-probability samples, which could lead to useful inferences from a non-probability sample by itself or when combined with a probability sample. I will also explain how big data may be used as predictors in small area estimation, a topic of current interest because of the growing demand for reliable local area statistics.

show abstract

Enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. An application to a Survey on the impact of the COVID‐19 pandemic in Spain

et al. 2022

View full text Add to dashboard Cite

Web surveys have replaced Face‐to‐Face and computer assisted telephone interviewing (CATI) as the main mode of data collection in most countries. This trend was reinforced as a consequence of COVID‐19 pandemic‐related restrictions. However, this mode still faces significant limitations in obtaining probability‐based samples of the general population. For this reason, most web surveys rely on nonprobability survey designs. Whereas probability‐based designs continue to be the gold standard in survey sampling, nonprobability web surveys may still prove useful in some situations. For instance, when small subpopulations are the group under study and probability sampling is unlikely to meet sample size requirements, complementing a small probability sample with a larger nonprobability one may improve the efficiency of the estimates. Nonprobability samples may also be designed as a mean for compensating for known biases in probability‐based web survey samples by purposely targeting respondent profiles that tend to be underrepresented in these surveys. This is the case in the Survey on the impact of the COVID‐19 pandemic in Spain (ESPACOV) that motivates this paper. In this paper, we propose a methodology for combining probability and nonprobability web‐based survey samples with the help of machine‐learning techniques. We then assess the efficiency of the resulting estimates by comparing them with other strategies that have been used before. Our simulation study and the application of the proposed estimation method to the second wave of the ESPACOV Survey allow us to conclude that this is the best option for reducing the biases observed in our data.

show abstract

Data Integration by Combining Big Data and Survey Sample Data for Finite Population Inference

Cited by 30 publications

References 32 publications

Big Data Integration Solutions in Organizations: A Domain-Specific Analysis

Big Data Integration Solutions in Organizations: A Domain-Specific Analysis

On Making Valid Inferences by Integrating Data from Surveys and Other Sources

Enhancing estimation methods for integrating probability and nonprobability survey samples with machine‐learning techniques. An application to a Survey on the impact of the COVID‐19 pandemic in Spain

Contact Info

Product

Resources

About