Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits 1-4 . Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly 2,5-7 . However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology 4,8-13 . We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.Using a combination of high-depth (average 78× ) Illumina pairedend and mate-pair libraries, we applied Allpaths-LG 14 to create de novo assemblies of high quality and coverage for each of the 150 individuals with a median scaffold N50 of ~ 21 megabases (Mb; maximum ~ 30 Mb) (Supplementary Table 1). The 100 largest scaffolds in each of the 140 best assemblies typically covered more than 75% (median 77%, Extended Data Fig. 1a) of the genome, with the largest scaffolds exceeding 110 Mb in size (Supplementary Table 1). To evaluate the accuracy of the assemblies, we subsequently aligned the scaffolds for each individual to the human reference genome (GRCh38) 15 . Figure 1 shows an example individual where the euchromatic part of each chromosome was almost completely covered by a few large scaffolds and in several cases scaffolds covered almost entire chromosome arms. Only rarely did we find that large scaffolds break and align to more than one chromosome (Extended Data Fig. 1b), suggesting that even the largest scaffolds are seldom chimaeric. We also compared our de novo assemblies with a published long-read assembly based on BioNano mapping and PacBio sequencing 16 . Extended Data Figs 2a and 3 show that this assembly was less complete than our assemblies, but with similar scaffold lengths. The long-read assembly had 5.38% missing data compared with our median of 4.25% (Extended Data Fig. 3a), but the missing data in our assemblies were found in smaller gaps (Extended Data Fig. 3b, c), and the median contig length was therefore much smaller th...
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR—the European infrastructure for biological information—that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
Asparaginase-associated pancreatitis is a life-threatening toxicity to childhood acute lymphoblastic leukemia treatment. To elucidate genetic predisposition and asparaginase-associated pancreatitis pathogenesis, ten trial groups contributed remission samples from patients aged 1.0−17.9 years treated for acute lymphoblastic leukemia between 2000 and 2016. Cases (n=244) were defined by the presence of at least two of the following criteria: (i) abdominal pain; (ii) levels of pancreatic enzymes ≥3 × upper normal limit; and (iii) imaging compatible with pancreatitis. Controls (n=1320) completed intended asparaginase therapy, with 78% receiving ≥8 injections of pegylated-asparaginase, without developing asparaginase-associated pancreatitis. rs62228256 on 20q13.2 showed the strongest association with the development of asparaginase-associated pancreatitis (odds ratio=3.75; P =5.2×10 −8 ). Moreover, rs13228878 (OR=0.61; P =7.1×10 −6 ) and rs10273639 (OR=0.62; P =1.1×10 −5 ) on 7q34 showed significant association with the risk of asparaginase-associated pancreatitis. A Dana Farber Cancer Institute ALL Consortium cohort consisting of patients treated on protocols between 1987 and 2004 (controls=285, cases=33), and the Children’s Oncology Group AALL0232 cohort (controls=2653, cases=76) were available as replication cohorts for the 20q13.2 and 7q34 variants, respectively. While rs62228256 was not validated as a risk factor ( P =0.77), both rs13228878 ( P =0.03) and rs10273639 ( P =0.04) were. rs13228878 and rs10273639 are in high linkage disequilibrium (r 2 =0.94) and associated with elevated expression of the PRSS1 gene, which encodes for trypsinogen, and are known risk variants for alcohol-associated and sporadic pancreatitis in adults. Intra-pancreatic trypsinogen cleavage to proteolytic trypsin induces autodigestion and pancreatitis. In conclusion, this study finds a shared genetic predisposition between asparaginase-associated pancreatitis and non-asparaginase-associated pancreatitis, and targeting the trypsinogen activation pathway may enable identification of effective interventions for asparaginase-associated pancreatitis.
Novel open-source software called the Mitotic Analyzing and Recording System (MAARS) provides automatic single-cell analysis of mitotic defects such as spindle mispositioning or chromosome missegregation. This approach made it possible to visualize rare and unexpected events of error correction in wild-type and mutant cells.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.