Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry. A total of 1000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole-genome sequencing. Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a genome-wide collection of aggregated variant frequencies in the Swedish population that we have made available to the scientific community through the website https://swefreq.nbis.se. A total of 29.2 million single-nucleotide variants and 3.8 million indels were detected in the 1000 samples, with 9.9 million of these variants not present in current databases. Each sample contributed with an average of 7199 individual-specific variants. In addition, an average of 8645 larger structural variants (SVs) were detected per individual, and we demonstrate that the population frequencies of these SVs can be used for efficient filtering analyses. Finally, our results show that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, underscoring the relevance of establishing a local reference data set.
Background We report the findings from 4437 individuals (3219 patients and 1218 relatives) who have been analyzed by whole genome sequencing (WGS) at the Genomic Medicine Center Karolinska-Rare Diseases (GMCK-RD) since mid-2015. GMCK-RD represents a long-term collaborative initiative between Karolinska University Hospital and Science for Life Laboratory to establish advanced, genomics-based diagnostics in the Stockholm healthcare setting. Methods Our analysis covers detection and interpretation of SNVs, INDELs, uniparental disomy, CNVs, balanced structural variants, and short tandem repeat expansions. Visualization of results for clinical interpretation is carried out in Scout—a custom-developed decision support system. Results from both singleton (84%) and trio/family (16%) analyses are reported. Variant interpretation is done by 15 expert teams at the hospital involving staff from three clinics. For patients with complex phenotypes, data is shared between the teams. Results Overall, 40% of the patients received a molecular diagnosis ranging from 19 to 54% for specific disease groups. There was heterogeneity regarding causative genes (n = 754) with some of the most common ones being COL2A1 (n = 12; skeletal dysplasia), SCN1A (n = 8; epilepsy), and TNFRSF13B (n = 4; inborn errors of immunity). Some causative variants were recurrent, including previously known founder mutations, some novel mutations, and recurrent de novo mutations. Overall, GMCK-RD has resulted in a large number of patients receiving specific molecular diagnoses. Furthermore, negative cases have been included in research studies that have resulted in the discovery of 17 published, novel disease-causing genes. To facilitate the discovery of new disease genes, GMCK-RD has joined international data sharing initiatives, including ClinVar, UDNI, Beacon, and MatchMaker Exchange. Conclusions Clinical WGS at GMCK-RD has provided molecular diagnoses to over 1200 individuals with a broad range of rare diseases. Consolidation and spread of this clinical-academic partnership will enable large-scale national collaboration.
BackgroundSince different types of genetic variants, from single nucleotide variants (SNVs) to large chromosomal rearrangements, underlie intellectual disability, we evaluated the use of whole-genome sequencing (WGS) rather than chromosomal microarray analysis (CMA) as a first-line genetic diagnostic test.MethodsWe analyzed three cohorts with short-read WGS: (i) a retrospective cohort with validated copy number variants (CNVs) (cohort 1, n = 68), (ii) individuals referred for monogenic multi-gene panels (cohort 2, n = 156), and (iii) 100 prospective, consecutive cases referred to our center for CMA (cohort 3). Bioinformatic tools developed include FindSV, SVDB, Rhocall, Rhoviz, and vcf2cytosure.ResultsFirst, we validated our structural variant (SV)-calling pipeline on cohort 1, consisting of three trisomies and 79 deletions and duplications with a median size of 850 kb (min 500 bp, max 155 Mb). All variants were detected. Second, we utilized the same pipeline in cohort 2 and analyzed with monogenic WGS panels, increasing the diagnostic yield to 8%. Next, cohort 3 was analyzed by both CMA and WGS. The WGS data was processed for large (> 10 kb) SVs genome-wide and for exonic SVs and SNVs in a panel of 887 genes linked to intellectual disability as well as genes matched to patient-specific Human Phenotype Ontology (HPO) phenotypes. This yielded a total of 25 pathogenic variants (SNVs or SVs), of which 12 were detected by CMA as well. We also applied short tandem repeat (STR) expansion detection and discovered one pathologic expansion in ATXN7. Finally, a case of Prader-Willi syndrome with uniparental disomy (UPD) was validated in the WGS data.Important positional information was obtained in all cohorts. Remarkably, 7% of the analyzed cases harbored complex structural variants, as exemplified by a ring chromosome and two duplications found to be an insertional translocation and part of a cryptic unbalanced translocation, respectively.ConclusionThe overall diagnostic rate of 27% was more than doubled compared to clinical microarray (12%). Using WGS, we detected a wide range of SVs with high accuracy. Since the WGS data also allowed for analysis of SNVs, UPD, and STRs, it represents a powerful comprehensive genetic test in a clinical diagnostic laboratory setting.
Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at https://github.com/nf-core/sarek and at https://nf-co.re/sarek/.
Reliable detection of large structural variation ( > 1000 bp) is important in both rare and common genetic disorders. Whole genome sequencing (WGS) is a technology that may be used to identify a large proportion of the genomic structural variants (SVs) in an individual in a single experiment. Even though SV callers have been extensively used in research to detect mutations, the potential usage of SV callers within routine clinical diagnostics is still limited. One well known, but not well-addressed problem is the large number of benign variants and reference errors present in the human genome that further complicates analysis. Even though there is a wide range of SV-callers available, the number of callers that allow detection of the entire spectra of SV at a low computational cost is still relatively limited.
Complex chromosomal rearrangements (CCRs) are rearrangements involving more than two chromosomes or more than two breakpoints. Whole genome sequencing (WGS) allows for outstanding high resolution characterization on the nucleotide level in unique sequences of such rearrangements, but problems remain for mapping breakpoints in repetitive regions of the genome, which are known to be prone to rearrangements. Hence, multiple complementary WGS experiments are sometimes needed to solve the structures of CCRs. We have studied three individuals with CCRs: Case 1 and Case 2 presented with de novo karyotypically balanced, complex interchromosomal rearrangements (46,XX,t(2;8;15)(q35;q24.1;q22) and 46,XY,t(1;10;5)(q32;p12;q31)), and Case 3 presented with a de novo, extremely complex intrachromosomal rearrangement on chromosome 1. Molecular cytogenetic investigation revealed cryptic deletions in the breakpoints of chromosome 2 and 8 in Case 1, and on chromosome 10 in Case 2, explaining their clinical symptoms. In Case 3, 26 breakpoints were identified using WGS, disrupting five known disease genes. All rearrangements were subsequently analyzed using optical maps, linked-read WGS, and short-read WGS. In conclusion, we present a case series of three unique de novo CCRs where we by combining the results from the different technologies fully solved the structure of each rearrangement. The power in combining short-read WGS with long-molecule sequencing or optical mapping in these unique de novo CCRs in a clinical setting is demonstrated.
Reliable detection of large structural variation ( > 1000 bp) is important in both rare and common genetic disorders. Whole genome sequencing (WGS) is a technology that may be used to identify a large proportion of the genomic structural variants (SVs) in an individual in a single experiment. Even though SV callers have been extensively used in research to detect mutations, the potential usage of SV callers within routine clinical diagnostics is hindered by high computational costs, usage of non-standard output format, and limited support for the various sequencing platforms and libraries. Another well known, but not well-addressed problem is the large number of benign variants and reference errors present in the human genome that further complicates analysis. Here we present TIDDIT, a time efficient variant caller, that uses discordant read pairs as well as the depth of coverage and split reads to detect and classify a large spectrum of SVs. As part of the software suite, TIDDIT also includes a database functionality that enables filtering for rare variants and reduces the number of false positive calls and background noise. Benchmarked against five state-of-the-art SV callers, TIDDIT performs at an equal/superior level while using only 2 CPU hours per sample. Thanks to its speed, sensitivity, flexibility and ability to easily detect variants on a wide range of WGS library types, TIDDIT solves many of the problems that are currently hindering the utilization of WGS for SV calling in clinical settings.
Cytogenetically detected inversions are generally assumed to be copy number and phenotypically neutral events. While nonallelic homologous recombination is thought to play a major role, recent data suggest the involvement of other molecular mechanisms in inversion formation. Using a combination of short-read whole-genome sequencing (WGS), 10X Genomics Chromium WGS, droplet digital polymerase chain reaction and array comparative genomic hybridization we investigated the genomic structure of 18 large unique cytogenetically detected chromosomal inversions and achieved nucleotide resolution of at least one chromosomal inversion junction for 13/18 (72%). Surprisingly, we observed that seemingly copy number neutral inversions can be accompanied by a copynumber gain of up to 350 kb and local genomic complexities (3/18, 17%). In the resolved
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.