Katherine Todd-Brown scite author profile

Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

show abstract

Causes of variation in soil carbon simulations from CMIP5 Earth system models and comparison with observations

Todd-Brown

et al. 2013

View full text Add to dashboard Cite

Stocks of soil organic carbon represent a large component of the carbon cycle that may participate in climate change feedbacks, particularly on decadal and centennial timescales. For Earth system models (ESMs), the ability to accurately represent the global distribution of existing soil carbon stocks is a prerequisite for accurately predicting future carbon–climate feedbacks. We compared soil carbon simulations from 11 model centers to empirical data from the Harmonized World Soil Database (HWSD) and the Northern Circumpolar Soil Carbon Database (NCSCD). Model estimates of global soil carbon stocks ranged from 510 to 3040 Pg C, compared to an estimate of 1260 Pg C (with a 95% confidence interval of 890–1660 Pg C) from the HWSD. Model simulations for the high northern latitudes fell between 60 and 820 Pg C, compared to 500 Pg C (with a 95% confidence interval of 380–620 Pg C) for the NCSCD and 290 Pg C for the HWSD. Global soil carbon varied 5.9 fold across models in response to a 2.6-fold variation in global net primary productivity (NPP) and a 3.6-fold variation in global soil carbon turnover times. Model–data agreement was moderate at the biome level (R² values ranged from 0.38 to 0.97 with a mean of 0.75); however, the spatial distribution of soil carbon simulated by the ESMs at the 1° scale was not well correlated with the HWSD (Pearson correlation coefficients less than 0.4 and root mean square errors from 9.4 to 20.8 kg C m⁻²). In northern latitudes where the two data sets overlapped, agreement between the HWSD and the NCSCD was poor (Pearson correlation coefficient 0.33), indicating uncertainty in empirical estimates of soil carbon. We found that a reduced complexity model dependent on NPP and soil temperature explained much of the 1° spatial variation in soil carbon within most ESMs (R² values between 0.62 and 0.93 for 9 of 11 model centers). However, the same reduced complexity model only explained 10% of the spatial variation in HWSD soil carbon when driven by observations of NPP and temperature, implying that other drivers or processes may be more important in explaining observed soil carbon distributions. The reduced complexity model also showed that differences in simulated soil carbon across ESMs were driven by differences in simulated NPP and the parameterization of soil heterotrophic respiration (inter-model R² = 0.93), not by structural differences between the models. Overall, our results suggest that despite fair global-scale agreement with observational data and moderate agreement at the biome scale, most ESMs cannot reproduce grid-scale variation in soil carbon and may be missing key processes. Future work should focus on improving the simulation of driving variables for soil carbon stocks and modifying model structures to include additional processes

show abstract

Whole-genome association study of bipolar disorder

et al. 2008

View full text Add to dashboard Cite

We performed a genome-wide association scan in 1461 patients with bipolar (BP) 1 disorder, 2008 controls drawn from the Systematic Treatment Enhancement Program for Bipolar Disorder and the University College London sample collections with successful genotyping for 372 193 single nucleotide polymorphisms (SNPs). Our strongest single SNP results are found in myosin5B (MYO5B; P = 1.66 Â 10 À7 ) and tetraspanin-8 (TSPAN8; P = 6.11 Â 10 À7 ). Haplotype analysis further supported single SNP results highlighting MYO5B, TSPAN8 and the epidermal growth factor receptor (MYO5B; P = 2.04 Â 10 À8 , TSPAN8; P = 7.57 Â 10 À7 and EGFR; P = 8.36 Â 10 À8 ). For replication, we genotyped 304 SNPs in family-based NIMH samples (n = 409 trios) and University of Edinburgh case-control samples (n = 365 cases, 351 controls) that did not provide independent replication after correction for multiple testing. A comparison of our strongest associations with the genome-wide scan of 1868 patients with BP disorder and 2938 controls who completed the scan as part of the Wellcome Trust Case-Control Consortium indicates concordant signals for SNPs within the voltage-dependent calcium channel, L-type, alpha 1C subunit (CACNA1C) gene. Given the heritability of BP disorder, the lack of agreement between studies emphasizes that susceptibility alleles are likely to be modest in effect size and require even larger samples for detection.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.