Arya Massarat scite author profile

Short tandem repeats (STRs), genomic regions each consisting of a sequence of 1-6 base pairs repeated in succession, represent one of the largest sources of human genetic variation. However, many STR effects are not captured well by standard genome-wide association studies (GWAS) or downstream analyses that are mostly based on single nucleotide polymorphisms (SNPs). To study the involvement of STRs in complex traits, we imputed genotypes for 445,735 autosomal STRs into SNP data from 408,153 White British UK Biobank participants and tested for association with 44 blood and serum biomarker phenotypes. We used two fine-mapping methods, SuSiE and FINEMAP, to identify 118 high-confidence STR-trait associations predicted as causal variants under all fine-mapping settings tested. Using these results, we estimate that STRs drive 5.2-9.7% of GWAS signals for these traits. Our high confidence STR-trait associations implicate STRs in some of the strongest hits for multiple phenotypes, including a trinucleotide STR in APOB associated with LDL cholesterol and a CGG repeat in the promoter of CBL associated with multiple platelet traits. Replication analyses in additional population groups and orthogonal expression data further support the role of a subset of the candidate STRs we identify. Together, our study suggests that polymorphic tandem repeats make widespread contributions to complex traits, provides a set of stringently selected candidate causal STRs, and demonstrates the need to routinely consider a more complete view of human genetic variation in GWAS.

Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq

Sen

Jaureguy

et al. 2021

Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost ‘capture’ method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance.

Haptools: a toolkit for admixture and haplotype analysis

Lamkin

Reeve

et al. 2023

Summary Leveraging local ancestry and haplotype information in genome-wide association studies and downstream analyses can improve the utility of genomics for individuals from diverse and recently admixed ancestries. However, most existing simulation, visualization, and variant analysis frameworks are based on variant-level analysis and do not automatically handle these features. We present haptools, an open-source toolkit for performing local-ancestry aware and haplotype-based analysis of complex traits. Haptools supports fast simulation of admixed genomes, visualization of admixture tracks, simulation of haplotype- and local ancestry-specific phenotype effects, and a variety of file operations and statistics computed in a haplotype-aware manner. Availability Haptools is freely available at https://github.com/cast-genomics/haptools. Documentation Detailed documentation is available at https://haptools.readthedocs.io. Supplementary information Supplementary data are available at Bioinformatics online.

Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq

Sen

Jaureguy

et al. 2021

Preprint

Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost 'capture' method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance.

Human pangenome supports analysis of complex genomic regions

Gymrek

McStay

2023

Nature

environmental fluctuations can perturb the experimental system.Two key technological developments enabled the success of Storz and colleagues' Bell experiment. By achieving a single-qubit readout of around 50 nanoseconds, much faster than the few hundred nanoseconds that define the multi-qubit state-of-the-art systems 12, 13 , the authors were able to reduce the required qubit separation to around 30 metres. Then, they developed a low-loss cryogenic waveguide of this size and integrated it with the qubits to reach a high-fidelity connected system.Photon-based implementations typically violate Bell's inequality by a small margin, but with a data-production rate high enough to show a statistically significant violation in a relatively short collection time. Matter-based implementations usually violate the inequality by a larger margin, but have low data-acquisition rates, making it difficult, or at least time consuming, to reach high statistical certainty. Storz and colleagues' set-up violates Bell's inequality by a higher margin than previous photon-based experiments, with a higher rate of data production than that obtained in previous matter-based experiments [8][9][10][11] .This Bell experiment sets a record for the longest separation between two entangled superconducting qubits, and is impressive because of its physical size and precision. Although the 50-nanosecond readout demonstrated here cannot readily be applied to multi-qubit quantum computers, it pushes this qubit technology to new limits. Similarly, although the super conducting-waveguide approach does not scale to arbitrary distances, it represents a path towards quantum-information transfer between superconducting-qubit chips, a technology that will be needed in a large-scale quantum computer. With the achievement of this foundational quantum milestone, and the technological advancements that enabled it, Storz et al. have expanded the superconducting-qubit toolbox and given further credibility to this promising platform.