BackgroundNot all obese subjects have an adverse metabolic profile predisposing them to developing type 2 diabetes or cardiovascular disease. The BioSHaRE-EU Healthy Obese Project aims to gain insights into the consequences of (healthy) obesity using data on risk factors and phenotypes across several large-scale cohort studies. Aim of this study was to describe the prevalence of obesity, metabolic syndrome (MetS) and metabolically healthy obesity (MHO) in ten participating studies.MethodsTen different cohorts in seven countries were combined, using data transformed into a harmonized format. All participants were of European origin, with age 18–80 years. They had participated in a clinical examination for anthropometric and blood pressure measurements. Blood samples had been drawn for analysis of lipids and glucose. Presence of MetS was assessed in those with obesity (BMI ≥ 30 kg/m2) based on the 2001 NCEP ATP III criteria, as well as an adapted set of less strict criteria. MHO was defined as obesity, having none of the MetS components, and no previous diagnosis of cardiovascular disease.ResultsData for 163,517 individuals were available; 17% were obese (11,465 men and 16,612 women). The prevalence of obesity varied from 11.6% in the Italian CHRIS cohort to 26.3% in the German KORA cohort. The age-standardized percentage of obese subjects with MetS ranged in women from 24% in CHRIS to 65% in the Finnish Health2000 cohort, and in men from 43% in CHRIS to 78% in the Finnish DILGOM cohort, with elevated blood pressure the most frequently occurring factor contributing to the prevalence of the metabolic syndrome. The age-standardized prevalence of MHO varied in women from 7% in Health2000 to 28% in NCDS, and in men from 2% in DILGOM to 19% in CHRIS. MHO was more prevalent in women than in men, and decreased with age in both sexes.ConclusionsThrough a rigorous harmonization process, the BioSHaRE-EU consortium was able to compare key characteristics defining the metabolically healthy obese phenotype across ten cohort studies. There is considerable variability in the prevalence of healthy obesity across the different European populations studied, even when unified criteria were used to classify this phenotype.
Background: It is widely accepted and acknowledged that data harmonization is crucial: in its absence, the co-analysis of major tranches of high quality extant data is liable to inefficiency or error. However, despite its widespread practice, no formalized/systematic guidelines exist to ensure high quality retrospective data harmonization. Methods: To better understand real-world harmonization practices and facilitate development of formal guidelines, three interrelated initiatives were undertaken between 2006 and 2015. They included a phone survey with 34 major international research initiatives, a series of workshops with experts, and case studies applying the proposed guidelines. Results: A wide range of projects use retrospective harmonization to support their research activities but even when appropriate approaches are used, the terminologies, procedures, technologies and methods adopted vary markedly. The generic guidelines outlined in this article delineate the essentials required and describe an interdependent step-by-step approach to harmonization: 0) define the research question, objectives and protocol; 1) assemble pre-existing knowledge and select studies; 2) define targeted variables and evaluate harmonization potential; 3) process data; 4) estimate quality of the harmonized dataset(s) generated; and 5) disseminate and preserve final harmonization products. Conclusions: This manuscript provides guidelines aiming to encourage rigorous and effective approaches to harmonization which are comprehensively and transparently documented and straightforward to interpret and implement. This can be seen as a key step towards implementing guiding principles analogous to those that are well recognised as being essential in securing the foundational underpinning of systematic reviews and the meta-analysis of clinical trials.
Background: Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises important ethico-legal questions and can be controversial. In the UK this has been highlighted by recent debate and controversy relating to the UK’s proposed ‘care.data’ initiative, and these issues reflect important societal and professional concerns about privacy, confidentiality and intellectual property. DataSHIELD provides a novel technological solution that can circumvent some of the most basic challenges in facilitating the access of researchers and other healthcare professionals to individual-level data.Methods: Commands are sent from a central analysis computer (AC) to several data computers (DCs) storing the data to be co-analysed. The data sets are analysed simultaneously but in parallel. The separate parallelized analyses are linked by non-disclosive summary statistics and commands transmitted back and forth between the DCs and the AC. This paper describes the technical implementation of DataSHIELD using a modified R statistical environment linked to an Opal database deployed behind the computer firewall of each DC. Analysis is controlled through a standard R environment at the AC.Results: Based on this Opal/R implementation, DataSHIELD is currently used by the Healthy Obese Project and the Environmental Core Project (BioSHaRE-EU) for the federated analysis of 10 data sets across eight European countries, and this illustrates the opportunities and challenges presented by the DataSHIELD approach.Conclusions: DataSHIELD facilitates important research in settings where: (i) a co-analysis of individual-level data from several studies is scientifically necessary but governance restrictions prohibit the release or sharing of some of the required data, and/or render data access unacceptably slow; (ii) a research group (e.g. in a developing nation) is particularly vulnerable to loss of intellectual property—the researchers want to fully share the information held in their data with national and international collaborators, but do not wish to hand over the physical data themselves; and (iii) a data set is to be included in an individual-level co-analysis but the physical size of the data precludes direct transfer to a new site for analysis.
Ambient air pollution increases the risk of respiratory mortality, but evidence for impacts on lung function and chronic obstructive pulmonary disease (COPD) is less well established. The aim was to evaluate whether ambient air pollution is associated with lung function and COPD, and explore potential vulnerability factors.We used UK Biobank data on 303 887 individuals aged 40–69 years, with complete covariate data and valid lung function measures. Cross-sectional analyses examined associations of land use regression-based estimates of particulate matter (particles with a 50% cut-off aerodynamic diameter of 2.5 and 10 µm: PM2.5 and PM10, respectively; and coarse particles with diameter between 2.5 μm and 10 μm: PMcoarse) and nitrogen dioxide (NO2) concentrations with forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC), the FEV1/FVC ratio and COPD (FEV1/FVC <lower limit of normal). Effect modification was investigated for sex, age, obesity, smoking status, household income, asthma status and occupations previously linked to COPD.Higher exposures to each pollutant were significantly associated with lower lung function. A 5 µg·m−3 increase in PM2.5 concentration was associated with lower FEV1 (−83.13 mL, 95% CI −92.50– −73.75 mL) and FVC (−62.62 mL, 95% CI −73.91– −51.32 mL). COPD prevalence was associated with higher concentrations of PM2.5 (OR 1.52, 95% CI 1.42–1.62, per 5 µg·m−3), PM10 (OR 1.08, 95% CI 1.00–1.16, per 5 µg·m−3) and NO2 (OR 1.12, 95% CI 1.10–1.14, per 10 µg·m−3), but not with PMcoarse. Stronger lung function associations were seen for males, individuals from lower income households, and “at-risk” occupations, and higher COPD associations were seen for obese, lower income, and non-asthmatic participants.Ambient air pollution was associated with lower lung function and increased COPD prevalence in this large study.
Background Vast sample sizes are often essential in the quest to disentangle the complex interplay of the genetic, lifestyle, environmental and social factors that determine the aetiology and progression of chronic diseases. The pooling of information between studies is therefore of central importance to contemporary bioscience. However, there are many technical, ethico-legal and scientific challenges to be overcome if an effective, valid, pooled analysis is to be achieved. Perhaps most critically, any data that are to be analysed in this way must be adequately ‘harmonized’. This implies that the collection and recording of information and data must be done in a manner that is sufficiently similar in the different studies to allow valid synthesis to take place.Methods This conceptual article describes the origins, purpose and scientific foundations of the DataSHaPER (DataSchema and Harmonization Platform for Epidemiological Research; http://www.datashaper.org), which has been created by a multidisciplinary consortium of experts that was pulled together and coordinated by three international organizations: P3G (Public Population Project in Genomics), PHOEBE (Promoting Harmonization of Epidemiological Biobanks in Europe) and CPT (Canadian Partnership for Tomorrow Project).Results The DataSHaPER provides a flexible, structured approach to the harmonization and pooling of information between studies. Its two primary components, the ‘DataSchema’ and ‘Harmonization Platforms’, together support the preparation of effective data-collection protocols and provide a central reference to facilitate harmonization. The DataSHaPER supports both ‘prospective’ and ‘retrospective’ harmonization.Conclusion It is hoped that this article will encourage readers to investigate the project further: the more the research groups and studies are actively involved, the more effective the DataSHaPER programme will ultimately be.
AbstractsBackgroundIndividual-level data pooling of large population-based studies across research centres in international research projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union) project aims to address these issues by building a collaborative group of investigators and developing tools for data harmonization, database integration and federated data analyses.MethodsEight population-based studies in six European countries were recruited to participate in the BioSHaRE project. Through workshops, teleconferences and electronic communications, participating investigators identified a set of 96 variables targeted for harmonization to answer research questions of interest. Using each study’s questionnaires, standard operating procedures, and data dictionaries, harmonization potential was assessed. Whenever harmonization was deemed possible, processing algorithms were developed and implemented in an open-source software infrastructure to transform study-specific data into the target (i.e. harmonized) format. Harmonized datasets located on server in each research centres across Europe were interconnected through a federated database system to perform statistical analysis.ResultsRetrospective harmonization led to the generation of common format variables for 73% of matches considered (96 targeted variables across 8 studies). Authenticated investigators can now perform complex statistical analyses of harmonized datasets stored on distributed servers without actually sharing individual-level data using the DataSHIELD method.ConclusionNew Internet-based networking technologies and database management systems are providing the means to support collaborative, multi-center research in an efficient and secure manner. The results from this pilot project show that, given a strong collaborative relationship between participating studies, it is possible to seamlessly co-analyse internationally harmonized research databases while allowing each study to retain full control over individual-level data. We encourage additional collaborative research networks in epidemiology, public health, and the social sciences to make use of the open source tools presented herein.
Long-term exposures to road traffic noise and ambient air pollution were associated with blood biochemistry, providing a possible link between road traffic noise/air pollution and cardio-metabolic disease risk.
The current article shows that the DataSHaPER provides an effective and flexible approach for the retrospective harmonization of information across studies. To implement data synthesis, some additional scientific, ethico-legal and technical considerations must be addressed. The success of the DataSHaPER as a harmonization approach will depend on its continuing development and on the rigour and extent of its use. The DataSHaPER has the potential to take us closer to a truly collaborative epidemiology and offers the promise of enhanced research potential generated through synthesized databases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.