Large-scale population based analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short read whole genome sequencing. However, standard short-read approaches, used primarily due to accuracy, throughput and costs, fail to give a complete picture of a genome. They struggle to identify large, balanced structural events, cannot access repetitive regions of the genome and fail to resolve the human genome into its two haplotypes. Here we describe an approach that retains long range information while harnessing the advantages of short reads. Starting from only~ ng of DNA, we produce barcoded short read libraries. The use of novel informatic approaches allows for the barcoded short reads to be associated with the long molecules of origin producing a novel datatype known as 'Linked-Reads'. This approach allows for simultaneous detection of small and large variants from a single Linked-Read library. We have previously demonstrated the utility of whole genome Linked-Reads (lrWGS) for performing diploid, de novo assembly of individual genomes (Weisenfeld et al. ). In this manuscript, weshow the advantages of Linked-Reads over standard short read approaches for reference based analysis. We demonstrate the ability of Linked-Reads to reconstruct megabase scale haplotypes and to recover parts of the genome that are typically inaccessible to short reads, including phenotypically important genes such as STRC, SMN and SMN . We demonstrate the ability of both lrWGS and Linked-Read Whole Exome Sequencing (lrWES) to identify complex structural variations, including balanced events, single exon deletions, and single exon duplications. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.
Summary Background Wilms tumour is the most common childhood renal cancer and is genetically heterogeneous. While several Wilms tumour predisposition genes have been identified, there is strong evidence that further predisposition genes are likely to exist. Our study aim was to identify new predisposition genes for Wilms tumour. Methods In this exome sequencing study, we analysed lymphocyte DNA from 890 individuals with Wilms tumour, including 91 affected individuals from 49 familial Wilms tumour pedigrees. We used the protein-truncating variant prioritisation method to prioritise potential disease-associated genes for further assessment. We evaluated new predisposition genes in exome sequencing data that we generated in 334 individuals with 27 other childhood cancers and in exome data from The Cancer Genome Atlas obtained from 7632 individuals with 28 adult cancers. Findings We identified constitutional cancer-predisposing mutations in 33 individuals with childhood cancer. The three identified genes with the strongest signal in the protein-truncating variant prioritisation analyses were TRIM28, FBXW7 , and NYNRIN . 21 of 33 individuals had a mutation in TRIM28 ; there was a strong parent-of-origin effect, with all ten inherited mutations being maternally transmitted (p=0·00098). We also found a strong association with the rare epithelial subtype of Wilms tumour, with 14 of 16 tumours being epithelial or epithelial predominant. There were no TRIM28 mutations in individuals with other childhood or adult cancers. We identified truncating FBXW7 mutations in four individuals with Wilms tumour and a de-novo non-synonymous FBXW7 mutation in a child with a rhabdoid tumour. Biallelic truncating mutations in NYNRIN were identified in three individuals with Wilms tumour, which is highly unlikely to have occurred by chance (p<0·0001). Finally, we identified two de-novo KDM3B mutations, supporting the role of KDM3B as a childhood cancer predisposition gene. Interpretation The four new Wilms tumour predisposition genes identified— TRIM28, FBXW7, NYNRIN , and KDM3B —are involved in diverse biological processes and, together with the other 17 known Wilms tumour predisposition genes, account for about 10% of Wilms tumour cases. The overlap between these 21 constitutionally mutated predisposition genes and 20 genes somatically mutated in Wilms tumour is limited, consisting of only four genes. We recommend that all individuals with Wilms tumour should be offered genetic testing and particularly, those with epithelial Wilms tumour should be offered TRIM28 genetic testing. Only a third of the familial...
Large-scale population analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short-read whole-genome sequencing. However, these short-read approaches fail to give a complete picture of a genome. They struggle to identify structural events, cannot access repetitive regions, and fail to resolve the human genome into haplotypes. Here, we describe an approach that retains long range information while maintaining the advantages of short reads. Starting from ∼1 ng of high molecular weight DNA, we produce barcoded short-read libraries. Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as "Linked-Reads". This approach allows for simultaneous detection of small and large variants from a single library. In this manuscript, we show the advantages of Linked-Reads over standard short-read approaches for reference-based analysis. Linked-Reads allow mapping to 38 Mb of sequence not accessible to short reads, adding sequence in 423 difficult-to-sequence genes including disease-relevant genes STRC, SMN1, and SMN2. Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications. Further, Linked-Reads extend the region of high-confidence calls by 68.9 Mb. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.
canSAR (http://cansar.icr.ac.uk) is the largest, public, freely available, integrative translational research and drug discovery knowledgebase for oncology. canSAR integrates vast multidisciplinary data from across genomic, protein, pharmacological, drug and chemical data with structural biology, protein networks and more. It also provides unique data, curation and annotation and crucially, AI-informed target assessment for drug discovery. canSAR is widely used internationally by academia and industry. Here we describe significant developments and enhancements to the data, web interface and infrastructure of canSAR in the form of the new implementation of the system: canSARblack. We demonstrate new functionality in aiding translation hypothesis generation and experimental design, and show how canSAR can be adapted and utilised outside oncology.
AbstractcanSAR (http://cansar.icr.ac.uk) is a public, freely available, integrative translational research and drug discovery knowlegebase. canSAR informs researchers to help solve key bottlenecks in cancer translation and drug discovery. It integrates genomic, protein, pharmacological, drug and chemical data with structural biology, protein networks and unique, comprehensive and orthogonal ‘druggability’ assessments. canSAR is widely used internationally by academia and industry. Here we describe major enhancements to canSAR including new and expanded data. We also describe the first components of canSARblack—an advanced, responsive, multi-device compatible redesign of canSAR with a question-led interface.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.