Functional enrichment analysis is a key step in interpreting gene lists discovered in diverse high-throughput experiments. g:Profiler studies flat and ranked gene lists and finds statistically significant Gene Ontology terms, pathways and other gene function related terms. Translation of hundreds of gene identifiers is another core feature of g:Profiler. Since its first publication in 2007, our web server has become a popular tool of choice among basic and translational researchers. Timeliness is a major advantage of g:Profiler as genome and pathway information is synchronized with the Ensembl database in quarterly updates. g:Profiler supports 213 species including mammals and other vertebrates, plants, insects and fungi. The 2016 update of g:Profiler introduces several novel features. We have added further functional datasets to interpret gene lists, including transcription factor binding site predictions, Mendelian disease annotations, information about protein expression and complexes and gene mappings of human genetic polymorphisms. Besides the interactive web interface, g:Profiler can be accessed in computational pipelines using our R package, Python interface and BioJS component. g:Profiler is freely available at http://biit.cs.ut.ee/gprofiler/.
Polygenic risk scores are gaining more and more attention for estimating genetic risks for liabilities, especially for noncommunicable diseases. They are now calculated using thousands of DNA markers. In this paper, we compare the score distributions of two previously published very large risk score models within different populations. We show that the risk score model together with its risk stratification thresholds, built upon the data of one population, cannot be applied to another population without taking into account the target population’s structure. We also show that if an individual is classified to the wrong population, his/her disease risk can be systematically incorrectly estimated.
A telephone number and e-mail address to whom correspondence concerning the manuscript should be sent lili.milani@ut.ee +372 5304 5400 2 Conflict of interest notification pageConflict of interest V.M.L is a co-founder and owner of HepaPredict AB. Other authors have no conflict of interest to declare. ABSTRACT AND KEYWORDS PurposeBiomedical databases combining electronic medical records, phenotypic and genomic data constitute a powerful resource for the personalization of treatment. To leverage the wealth of information provided, algorithms are required that systematically translate the contained information into treatment recommendations based on existing genotype-phenotype associations. MethodsWe developed and tested algorithms for translation of pre-existing genotype data of over 44,000 participants of the Estonian biobank into pharmacogenetic recommendations. We compared the results obtained by whole genome sequencing, whole exome sequencing and genotyping using microarrays, and evaluated the impact of pharmacogenetic reporting based on drug prescription statistics in the Nordic countries and Estonia. ResultsOur most striking result was that the performance of genotyping arrays is similar to that of whole genome sequencing, whereas exome sequencing is not suitable for pharmacogenetic predictions. Interestingly, 99.8% of all assessed individuals had a genotype associated with increased risks to at least one medication, and thereby the implementation of pharmacogenetic recommendations based on genotyping affects at least 50 daily drug doses per 1000 inhabitants. ConclusionWe find that microarrays are a cost-effective solution for creating pre-emptive pharmacogenetic reports, and with slight modifications, existing databases can be applied for automated pharmacogenetic decision support for clinicians.
BackgroundModern activity trackers, including the Fitbit Zip, enable the measurement of both the step count as well as physical activity (PA) intensities. However, there is a need for field-based validation studies in a variety of populations before using trackers for research. Therefore, the purpose of the current study was to investigate the validity of Fitbit Zip step count, moderate to vigorous physical activity (MVPA) and sedentary minutes, in different school segments in 3rd grade students.MethodsThird grade students (N = 147, aged 9–10 years) wore a Fitbit Zip and an ActiGraph GT3x-BT accelerometer simultaneously on a belt for five days during school hours. The number of steps, minutes of MVPA and sedentary time during class time, physical education lessons and recess were extracted from both devices using time filters, based on the information from school time tables obtained from class teachers. The validity of the Fitbit Zip in different school segments was assessed using Bland-Altman analysis and Spearman’s correlation.ResultsThere was a strong correlation in the number of steps in all in-school segments between the two devices (r = 0.85–0.96, P < 0.001). The Fitbit Zip overestimated the number of steps in all segments, with the greatest overestimation being present in physical education lessons (345 steps). As for PA intensities, the agreement between the two devices in physical education and recess was moderate for MVPA minutes (r = 0.56 and r = 0.72, P < 0.001, respectively) and strong for sedentary time (r = 0.85 and r = 0.87, P < 0.001, respectively). During class time, the correlation was weak for MVPA minutes (r = 0.24, P < 0.001) and moderate for sedentary time (r = 0.57, P < 0.001). For total in-school time, the correlation between the two devices was strong for steps (r = 0.98, P < 0.001), MVPA (r = 0.80, P < 0.001) and sedentary time (r = 0.94, P < 0.001).ConclusionIn general, the Fitbit Zip can be considered a relatively accurate device for measuring the number of steps, MVPA and sedentary time in students in a school-setting. However, in segments where sedentary time dominates (e.g. academic classes), a research-grade accelerometer should be preferred.
PurposeBiomedical databases combining electronic medical records and phenotypic and genomic data constitute a powerful resource for the personalization of treatment. To leverage the wealth of information provided, algorithms are required that systematically translate the contained information into treatment recommendations based on existing genotype–phenotype associations.MethodsWe developed and tested algorithms for translation of preexisting genotype data of over 44,000 participants of the Estonian biobank into pharmacogenetic recommendations. We compared the results obtained by genome sequencing, exome sequencing, and genotyping using microarrays, and evaluated the impact of pharmacogenetic reporting based on drug prescription statistics in the Nordic countries and Estonia.ResultsOur most striking result was that the performance of genotyping arrays is similar to that of genome sequencing, whereas exome sequencing is not suitable for pharmacogenetic predictions. Interestingly, 99.8% of all assessed individuals had a genotype associated with increased risks to at least one medication, and thereby the implementation of pharmacogenetic recommendations based on genotyping affects at least 50 daily drug doses per 1000 inhabitants.ConclusionWe find that microarrays are a cost-effective solution for creating preemptive pharmacogenetic reports, and with slight modifications, existing databases can be applied for automated pharmacogenetic decision support for clinicians.
Due to the heterogeneity of existing European sources of observational healthcare data, data source-tailored choices are needed to execute multi-data source, multi-national epidemiological studies. This makes transparent documentation paramount. In this proof-of-concept study, a novel standard data derivation procedure was tested in a set of heterogeneous data sources. Identification of subjects with type 2 diabetes (T2DM) was the test case. We included three primary care data sources (PCDs), three record linkage of administrative and/or registry data sources (RLDs), one hospital and one biobank. Overall, data from 12 million subjects from six European countries were extracted. Based on a shared event definition, sixteeen standard algorithms (components) useful to identify T2DM cases were generated through a top-down/bottom-up iterative approach. Each component was based on one single data domain among diagnoses, drugs, diagnostic test utilization and laboratory results. Diagnoses-based components were subclassified considering the healthcare setting (primary, secondary, inpatient care). The Unified Medical Language System was used for semantic harmonization within data domains. Individual components were extracted and proportion of population identified was compared across data sources. Drug-based components performed similarly in RLDs and PCDs, unlike diagnoses-based components. Using components as building blocks, logical combinations with AND, OR, AND NOT were tested and local experts recommended their preferred data source-tailored combination. The population identified per data sources by resulting algorithms varied from 3.5% to 15.7%, however, age-specific results were fairly comparable. The impact of individual components was assessed: diagnoses-based components identified the majority of cases in PCDs (93–100%), while drug-based components were the main contributors in RLDs (81–100%). The proposed data derivation procedure allowed the generation of data source-tailored case-finding algorithms in a standardized fashion, facilitated transparent documentation of the process and benchmarking of data sources, and provided bases for interpretation of possible inter-data source inconsistency of findings in future studies.
The Estonian Biobank, governed by the Institute of Genomics at the University of Tartu (Biobank), has stored genetic material/DNA and continuously collected data since 2002 on a total of 52,274 individuals representing ~5% of the Estonian adult population and is increasing. To explore the utility of data available in the Biobank, we conducted a phenome-wide association study (PheWAS) in two areas of interest to healthcare researchers; asthma and liver disease. We used 11 asthma and 13 liver disease-associated single nucleotide polymorphisms (SNPs), identified from published genome-wide association studies, to test our ability to detect established associations. We confirmed 2 asthma and 5 liver disease associated variants at nominal significance and directionally consistent with published results. We found 2 associations that were opposite to what was published before (rs4374383:AA increases risk of NASH/NAFLD, rs11597086 increases ALT level). Three SNP-diagnosis pairs passed the phenome-wide significance threshold: rs9273349 and E06 (thyroiditis, p = 5.50x10 -8 ); rs9273349 and E10 (type-1 diabetes, p = 2.60x10 -7 ); and rs2281135 and K76 (non-alcoholic liver diseases, including NAFLD, p = 4.10x10 -7 ). We have validated our approach and confirmed the quality of the data for these conditions. Importantly, we demonstrate that the extensive amount of genetic and medical information from the Estonian Biobank can be successfully utilized for scientific research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.