Record linkage of administrative and survey data is increasingly used to generate evidence to inform policy and services. Although a powerful and efficient way of generating new information from existing data sets, errors related to data processing before, during and after linkage can bias results. However, researchers and users of linked data rarely have access to information that can be used to assess these biases or take them into account in analyses. As linked administrative data are increasingly used to provide evidence to guide policy and services, linkage error, which disproportionately affects disadvantaged groups, can undermine evidence for public health. We convened a group of researchers and experts from government data providers to develop guidance about the information that needs to be made available about the data linkage process, by data providers, data linkers, analysts and the researchers who write reports. The guidance goes beyond recommendations for information to be included in research reports. Our aim is to raise awareness of information that may be required at each step of the linkage pathway to improve the transparency, reproducibility, and accuracy of linkage processes, and the validity of analyses and interpretation of results.
Official statistics production based on a combination of data sources, including sample survey, census and administrative registers, is becoming more and more common. Reduction of response burden, gains of production cost efficiency as well as potentials for detailed spatial-demographic and longitudinal statistics are some of the major advantages associated with the use of integrated statistical data. Data integration has always been an essential feature associated with the use of administrative register data. But survey and census data should also be integrated, so as to widen their scope and improve the quality. There are many new and difficult challenges here that are beyond the traditional topics of survey sampling and data integration. In this article we consider statistical theory for data integration on a conceptual level. In particular, we present a two-phase life-cycle model for integrated statistical microdata, which provides a framework for the various potential error sources, and outline some concepts and topics for quality assessment beyond the ideal of error-free data. A shared understanding of these issues will hopefully help us to collocate and coordinate efforts in future research and development.
Summary Small area estimation is a research area in official and survey statistics of great practical relevance for national statistical institutes and related organizations. Despite rapid developments in methodology and software, researchers and users would benefit from having practical guidelines for the process of small area estimation. We propose a general framework for the production of small area statistics that is governed by the principle of parsimony and is based on three broadly defined stages, namely specification, analysis and adaptation, and evaluation. Emphasis is given to the interaction between a user of small area statistics and the statistician in specifying the target geography and parameters in the light of the available data. Model‐free and model‐dependent methods are described with a focus on model selection and testing, model diagnostics and adaptations such as use of data transformations. Uncertainty measures and the use of model and design‐based simulations for method evaluation are also at the centre of the paper. We illustrate the application of the proposed framework by using real data for the estimation of non‐linear deprivation indicators. Linear statistics, e.g. averages, are included as special cases of the general framework.
We develop a class of log-linear structural models that is suited to estimation of small area cross-classified counts based on survey data. This allows us to account for various associ- ation structures within the data and includes as a special case the restricted log-linear model underlying structure preserving estimation. The effect of survey design can be incorporated into estimation through the specification of an unbiased direct estimator and its associated covariance structure. We illustrate our approach by applying it to estimation of small area labour force characteristics in Norway. Copyright 2004 Royal Statistical Society.
We synthesise the existing theory of graph sampling. We propose a formal definition of sampling in finite graphs, and provide a classification of potential graph parameters. We develop a general approach of Horvitz-Thompson estimation to T -stage snowball sampling, and present various reformulations of some common network sampling methods in the literature in terms of the outlined graph sampling theory.
Road traffic crashes (RTCs) are a major global public health problem and cause substantial burden on national economy and healthcare. There is little systematic understanding of the geography of RTCs and the spatial correlations of RTCs in the Middle-East region, particularly in Oman where RTCs are the leading cause of disability-adjusted life years lost. The overarching goal of this paper is to evaluate the spatial and temporal dimensions, identifying the high risk areas or hot-zones where RTCs are more frequent using the geocoded data from the Muscat governorate.
Register data that originate from administrative or other secondary sources are increasingly being used to generate statistical outputs directly. The coverage of the input datasets is an important issue in this respect. Traditionally capture-recapture models have been used to deal with multiple list enumerations subjected to undercoverage errors. The aim of this article is to scope possible approaches to modelling capture-recapture data with additional overcoverage error. Attention is primarily given to model interpretations and conditions under which a model may provide a plausible basis for estimation and uncertainty evaluation. The setting with two list enumerations is examined in depth as it is the most common in practice. Models that can be extended to include more than two lists are identified. An additional independent coverage survey with only undercoverage error is always needed for estimation. Potential application to census coverage-error adjustment is discussed.
Many local sheep breeds in China have poor meat quality. Increasing intramuscular fat (IMF) content can significantly improve the quality of mutton. However, the molecular mechanisms of intramuscular adipocyte formation and differentiation remain unclear. This study compared differences between preadipocytes and mature adipocytes by whole-transcriptome sequencing and constructed systematically regulatory networks according to the relationship predicted among the differentially expressed RNAs (DERs). Sequencing results showed that in this process, there were 1,196, 754, 100, and 17 differentially expressed messenger RNAs (mRNAs), long non-coding RNAs (lncRNAs), microRNAs (miRNAs), and circular RNAs (circRNAs), respectively. Gene Ontology analysis showed that most DERs enriched in Cell Part, Cellular Process, Biological Regulation, and Binding terms. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis found that the DERs primarily focused on Focal adhesion, phosphoinositide 3-kinase (PI3K)-Akt, mitogen-activated protein kinase (MAPK), peroxisome proliferator-activated receptor (PPAR) signaling pathways. Forty (40) DERs were randomly selected from the core regulatory network to verify the accuracy of the sequence data. The results of qPCR showed that the DER expression trend was consistent with sequence data. Four novel promising candidate miRNAs (miR-336, miR-422, miR-578, and miR-722) played crucial roles in adipocyte differentiation, and they also participated in multiple and important regulatory networks. We verified the expression pattern of the miRNAs and related pathways’ members at five time points in the adipocyte differentiation process (0, 2, 4, 6, 8, 10 days) by qPCR, including miR-336/ACSL4/LncRNA-MSTRG71379/circRNA0002331, miR-422/FOXO4/LncRNA-MSTRG54995/circRNA0000520, miR-578/IGF1/LncRNA-MSTRG102235/circRNA0002971, and miR-722/PDK4/LncRNA-MSTRG107440/circ RNA0002909. In this study, our data provided plenty of valuable candidate DERs and regulatory networks for researching the molecular mechanisms of sheep adipocyte differentiation and will assist studies in improving the IMF.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.