Recent research into structural variants (SVs) has established their importance to medicine and molecular biology, elucidating their role in various diseases, regulation of gene expression, ethnic diversity, and large-scale chromosome evolution—giving rise to the differences within populations and among species. Nevertheless, characterizing SVs and determining the optimal approach for a given experimental design remains a computational and scientific challenge. Multiple approaches have emerged to target various SV classes, zygosities, and size ranges. Here, we review these approaches with respect to their ability to infer SVs across the full spectrum of large, complex variations and present computational methods for each approach.
With the aim to uncover the molecular pathways underlying the regulation of sleep, we recently assembled an extensive and comprehensive systems genetics dataset interrogating a genetic reference population of mice at the levels of the genome, the brain and liver transcriptomes, the plasma metabolome, and the sleep-wake phenome. To facilitate a meaningful and efficient re-use of this public resource by others we designed, describe in detail, and made available a Digital Research Object (DRO), embedding data, documentation, and analytics. We present and discuss both the advantages and limitations of our multi-modal resource and analytic pipeline. The reproducibility of the results was tested by a bioinformatician not implicated in the original project and the robustness of results was assessed by re-annotating genetic and transcriptome data from the mm9 to the mm10 mouse genome assembly.
13More and more researchers make use of multi-omics approaches to tackle complex cellular 14 and organismal systems. It has become apparent that the potential for re-use and integrate data 15 generated by different labs can enhance knowledge. However, a meaningful and efficient re-16 use of data generated by others is difficult to achieve without in depth understanding of how 17 these datasets were assembled. We therefore designed and describe in detail a digital research 18 object embedding data, documentation and analytics on mouse sleep regulation. The aim of 19 this study was to bring together electrophysiological recordings, sleep-wake behavior, 20 metabolomics, genetics, and gene regulatory data in a systems genetics model to investigate 21 sleep regulation in the BXD panel of recombinant inbred lines. We here showcase both the 22 Experiment 1 and Experiment 2 ( Figure 1) were approved by the veterinary authorities of the 95 state of Vaud, Switzerland (SCAV authorization #2534). 96 97 Animal, breeding, and housing conditions 98 34 BXD lines originating from the University of Tennessee Health Science Center (Memphis, 99 TN, United States of America) were selected for Experiment 1 and Experiment 2. These lines 100 were randomly chosen from the newly generated advanced recombinant inbred line (ARIL) 101 RwwJ panel 4 , although lines with documented poor breeding performance were not 102 considered. 4 additional BXD RI strains were chosen from the older TyJ panel for 103 reproducibility purposes and were obtained directly from the Jackson Laboratory (JAX, Bar 104Harbor, Maine). The names used for some of the BXD lines have been modified over time to 105 reflect genetic proximity. Table 1 lists the BXD line names we used in our files alongside the 106 corresponding current JAX names and IDs. In our analyses, we discarded the BXD63/RwwJ 107 line for quality reasons (see Technical Validation) as well as the 4 older BXD strains that were 108 derived from a different DBA/2 sub-strains, i.e. DBA/2Rj instead of DBA/2J for RwwJ lines 109 21 . The methods below describe the remaining 33 BXD lines, F1 and parental strains. 110 Two breeding trios per BXD strain were purchased from a local facility (EPFL-SV, Lausanne, 111 Switzerland) and bred in-house until sufficient offspring was obtained. The parental strains 112 DBA/2J (D2), C57BL6/J (B6) and their reciprocal F1 offspring (B6D2F1 [BD-F1] and 113 D2B6F1 [DB-F1]) were bred and phenotyped alongside. Suitable (age and sex) offspring was 114 transferred to our sleep-recording facility, where they were singly housed, with food and 115 water available ad libitum, at a constant temperature of 25°C and under a 12 h light/12 h dark 116 cycle (LD12:12, fluorescent lights, intensity 6.6 cds/m 2 , with ZT0 and ZT12 designating light 117 and dark onset, respectively). Male mice aged 11-14 week at the time of experiment were 118 used for phenotyping, with a mean of 12 animals per BXD line among all experiments. Note 119 that 3 BXD lines had a lower replicate number (n), with respect...
Targeted DNA sequencing approaches will improve how the size of short tandem repeats is measured for diagnostic tests and pre-clinical studies. The expansion of these sequences causes dozens of disorders, with longer tracts generally leading to a more severe disease. In addition, interruptions are sometimes present within repeats and can alter disease manifestation. Despite advances in methodologies, determining repeat size and identifying interruptions in targeted sequencing datasets remains a major challenge. This is because standard alignment tools are ill-suited for the repetitive nature of these sequences. To address this, we have developed Repeat Detector (RD), a deterministic profile weighting algorithm for counting repeats in targeted sequencing data. We tested RD using blood-derived DNA samples from Huntington’s disease (HD) and Fuchs endothelial corneal dystrophy patients sequenced using either Illumina MiSeq or Pacific Biosciences single-molecule, real-time sequencing platforms. RD was highly accurate in determining repeat sizes of 609 HD blood-derived samples and did not require prior knowledge of the flanking sequences or their polymorphisms within the patient population. We demonstrate that RD can be used to identify individuals with repeat interruptions and may provide a measure of repeat instability within an individual. RD is therefore highly versatile and may find applications in the diagnosis of expanded repeat disorders and the development of novel therapies.
Targeted DNA sequencing approaches will improve how the size of short tandem repeats is measured for diagnostic tests and preclinical studies. The expansion of these sequences causes dozens of disorders, with longer tracts generally leading to a more severe disease. Interrupted alleles are sometimes present within repeats and can alter disease manifestation. Determining repeat size mosaicism and identifying interruptions in targeted sequencing datasets remains a major challenge. This is in part because standard alignment tools are ill-suited for repetitive and unstable sequences. To address this, we have developed Repeat Detector (RD), a deterministic profile weighting algorithm for counting repeats in targeted sequencing data. We tested RD using blood-derived DNA samples from Huntington’s disease and Fuchs endothelial corneal dystrophy patients sequenced using either Illumina MiSeq or Pacific Biosciences single-molecule, real-time sequencing platforms. RD was highly accurate in determining repeat sizes of 609 blood-derived samples from Huntington’s disease individuals and did not require prior knowledge of the flanking sequences. Furthermore, RD can be used to identify alleles with interruptions and provide a measure of repeat instability within an individual. RD is therefore highly versatile and may find applications in the diagnosis of expanded repeat disorders and in the development of novel therapies.
Genetic variations affect behavior and cause disease but understanding how these variants drive complex traits is still an open question. A common approach is to link the genetic variants to intermediate molecular phenotypes such as the transcriptome using RNA-sequencing (RNA-seq). Paradoxically, these variants between the samples are usually ignored at the beginning of RNA-seq analyses of many model organisms. This can skew the transcriptome estimates that are used later for downstream analyses, such as expression quantitative trait locus (eQTL) detection. Here, we assessed the impact of reference-based analysis on the transcriptome and eQTLs in a widely-used mouse genetic population: the BXD panel of recombinant inbred lines. We highlight existing reference bias in the transcriptome data analysis and propose practical solutions which combine available genetic variants, genotypes, and genome reference sequence. The use of custom BXD line references improved downstream analysis compared to classical genome reference. These insights would likely benefit genetic studies with a transcriptomic component and demonstrate that genome references need to be reassessed and improved.
Genetic variations affect behavior and cause disease but understanding how these variants drive complex traits is still an open question. A common approach is to link the genetic variants to intermediate molecular phenotypes such as the transcriptome using RNA-sequencing (RNA-seq). Paradoxically, these variants between the samples are usually ignored at the beginning of RNA-seq analyses of many model organisms. This can skew the transcriptome estimates that are used later for downstream analyses, such as expression quantitative trait locus (eQTL) detection. Here, we assessed the impact of reference-based analysis on the transcriptome and eQTLs in a widely-used mouse genetic population: the BXD panel of recombinant inbred lines. We highlight existing reference bias in the transcriptome data analysis and propose practical solutions which combine available genetic variants, genotypes, and genome reference sequence. The use of custom BXD line references improved downstream analysis compared to classical genome reference. These insights would likely benefit genetic studies with a transcriptomic component and demonstrate that genome references might need to be reassessed and improved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.