Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring families and constructed a haplotype map of 20.4 million single-nucleotide variants and 1.2 million insertions and deletions. The intermediate coverage (∼13×) and trio design enabled extensive characterization of structural variation, including midsize events (30-500 bp) previously poorly catalogued and de novo mutations. We demonstrate that the quality of the haplotypes boosts imputation accuracy in independent samples, especially for lower frequency alleles. Population genetic analyses demonstrate fine-scale structure across the country and support multiple ancient migrations, consistent with historical changes in sea level and flooding. The GoNL Project illustrates how single-population whole-genome sequencing can provide detailed characterization of genetic variation and may guide the design of future population studies.
Small insertions and deletions (indels) and large structural variations (SVs) are major contributors to human genetic diversity and disease. However, mutation rates and characteristics of de novo indels and SVs in the general population have remained largely unexplored. We report 332 validated de novo structural changes identified in whole genomes of 250 families, including complex indels, retrotransposon insertions, and interchromosomal events. These data indicate a mutation rate of 2.94 indels (1-20 bp) and 0.16 SVs (>20 bp) per generation. De novo structural changes affect on average 4.1 kbp of genomic sequence and 29 coding bases per generation, which is 91 and 52 times more nucleotides than de novo substitutions, respectively. This contrasts with the equal genomic footprint of inherited SVs and substitutions. An excess of structural changes originated on paternal haplotypes. Additionally, we observed a nonuniform distribution of de novo SVs across offspring. These results reveal the importance of different mutational mechanisms to changes in human genome structure across generations.
Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals.
Complex indels are formed by simultaneously deleting and inserting DNA fragments of different sizes at a common genomic location. Here, we present a systematic analysis of somatic complex indels in the coding sequences of over 8,000 cancer cases using Pindel-C. We discovered 285 complex indels in cancer genes (e.g., PIK3R1, TP53, ARID1A, GATA3, and KMT2D) in approximately 3.5% of cases analyzed; nearly all instances of complex indels were overlooked (81.1%) or mis-annotated (17.6%) in 2,199 samples previously reported. In-frame complex indels are enriched in PIK3R1 and EGFR while frameshifts are prevalent in VHL, GATA3, TP53, ARID1A, PTEN, and ATRX. Further, complex indels display strong tissue specificity (e.g., VHL from kidney cancer and GATA3 from breast cancer). Finally, structural analyses support findings of previously missed, but potentially druggable mutations in EGFR, MET, and KIT oncogenes. This study indicates the critical importance of improving complex indel discovery and interpretation in medical research.
To help chemists design new drugs, we created a tool that uses interactive evolution to design drug molecules, the "Molecule Evoluator". In contrast to most other evolutionary de novo design programs, the molecule representation and the set of mutations enable it to both search the chemical space of all drug like molecules extensively and to fine-tune molecular structures to the problem at hand. Additionally, we use interaction with the user as a fitness function, which is new in evolutionary algorithms in drug design. This interactivity allows the Molecule Evoluator to use the domain knowledge of the chemist to estimate the ease of synthesis and the biological activity of the compound. This knowledge can guide the optimization process and thereby improve its results. Chemists of our department using the Molecule Evoluator were able to find six novel and synthesizable druglike core structures, indicating that the Molecule Evoluator can be used as a tool to enhance the chemist's creativity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.