Short tandem repeats (STRs), genomic regions each consisting of a sequence of 1-6 base pairs repeated in succession, represent one of the largest sources of human genetic variation. However, many STR effects are not captured well by standard genome-wide association studies (GWAS) or downstream analyses that are mostly based on single nucleotide polymorphisms (SNPs). To study the involvement of STRs in complex traits, we imputed genotypes for 445,735 autosomal STRs into SNP data from 408,153 White British UK Biobank participants and tested for association with 44 blood and serum biomarker phenotypes. We used two fine-mapping methods, SuSiE and FINEMAP, to identify 118 high-confidence STR-trait associations predicted as causal variants under all fine-mapping settings tested. Using these results, we estimate that STRs drive 5.2-9.7% of GWAS signals for these traits. Our high confidence STR-trait associations implicate STRs in some of the strongest hits for multiple phenotypes, including a trinucleotide STR in APOB associated with LDL cholesterol and a CGG repeat in the promoter of CBL associated with multiple platelet traits. Replication analyses in additional population groups and orthogonal expression data further support the role of a subset of the candidate STRs we identify. Together, our study suggests that polymorphic tandem repeats make widespread contributions to complex traits, provides a set of stringently selected candidate causal STRs, and demonstrates the need to routinely consider a more complete view of human genetic variation in GWAS.
Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost ‘capture’ method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance.
Summary Leveraging local ancestry and haplotype information in genome-wide association studies and downstream analyses can improve the utility of genomics for individuals from diverse and recently admixed ancestries. However, most existing simulation, visualization, and variant analysis frameworks are based on variant-level analysis and do not automatically handle these features. We present haptools, an open-source toolkit for performing local-ancestry aware and haplotype-based analysis of complex traits. Haptools supports fast simulation of admixed genomes, visualization of admixture tracks, simulation of haplotype- and local ancestry-specific phenotype effects, and a variety of file operations and statistics computed in a haplotype-aware manner. Availability Haptools is freely available at https://github.com/cast-genomics/haptools. Documentation Detailed documentation is available at https://haptools.readthedocs.io. Supplementary information Supplementary data are available at Bioinformatics online.
Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost 'capture' method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance.
environmental fluctuations can perturb the experimental system.Two key technological developments enabled the success of Storz and colleagues' Bell experiment. By achieving a single-qubit readout of around 50 nanoseconds, much faster than the few hundred nanoseconds that define the multi-qubit state-of-the-art systems 12, 13 , the authors were able to reduce the required qubit separation to around 30 metres. Then, they developed a low-loss cryogenic waveguide of this size and integrated it with the qubits to reach a high-fidelity connected system.Photon-based implementations typically violate Bell's inequality by a small margin, but with a data-production rate high enough to show a statistically significant violation in a relatively short collection time. Matter-based implementations usually violate the inequality by a larger margin, but have low data-acquisition rates, making it difficult, or at least time consuming, to reach high statistical certainty. Storz and colleagues' set-up violates Bell's inequality by a higher margin than previous photon-based experiments, with a higher rate of data production than that obtained in previous matter-based experiments [8][9][10][11] .This Bell experiment sets a record for the longest separation between two entangled superconducting qubits, and is impressive because of its physical size and precision. Although the 50-nanosecond readout demonstrated here cannot readily be applied to multi-qubit quantum computers, it pushes this qubit technology to new limits. Similarly, although the super conducting-waveguide approach does not scale to arbitrary distances, it represents a path towards quantum-information transfer between superconducting-qubit chips, a technology that will be needed in a large-scale quantum computer. With the achievement of this foundational quantum milestone, and the technological advancements that enabled it, Storz et al. have expanded the superconducting-qubit toolbox and given further credibility to this promising platform.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.