Background Mammalian X and Y chromosomes share a common evolutionary origin and retain regions of high sequence similarity. Similar sequence content can confound the mapping of short next-generation sequencing reads to a reference genome. It is therefore possible that the presence of both sex chromosomes in a reference genome can cause technical artifacts in genomic data and affect downstream analyses and applications. Understanding this problem is critical for medical genomics and population genomic inference. Results Here, we characterize how sequence homology can affect analyses on the sex chromosomes and present XYalign, a new tool that (1) facilitates the inference of sex chromosome complement from next-generation sequencing data; (2) corrects erroneous read mapping on the sex chromosomes; and (3) tabulates and visualizes important metrics for quality control such as mapping quality, sequencing depth, and allele balance. We find that sequence homology affects read mapping on the sex chromosomes and this has downstream effects on variant calling. However, we show that XYalign can correct mismapping, resulting in more accurate variant calling. We also show how metrics output by XYalign can be used to identify XX and XY individuals across diverse sequencing experiments, including low- and high-coverage whole-genome sequencing, and exome sequencing. Finally, we discuss how the flexibility of the XYalign framework can be leveraged for other uses including the identification of aneuploidy on the autosomes. XYalign is available open source under the GNU General Public License (version 3). Conclusions Sex chromsome sequence homology causes the mismapping of short reads, which in turn affects downstream analyses. XYalign provides a reproducible framework to correct mismapping and improve variant calling on the sex chromsomes.
Autism spectrum disorder (ASD) is a constellation of neurodevelopmental disorders with high phenotypic and genetic heterogeneity, complicating the discovery of causative genes. Through a forward genetics approach selecting for defective vocalization in mice, we identified Kdm5a as a candidate ASD gene. To validate our discovery, we generated a Kdm5a knockout mouse model (Kdm5a-/-) and confirmed that inactivating Kdm5a disrupts vocalization. In addition, Kdm5a-/- mice displayed repetitive behaviors, sociability deficits, cognitive dysfunction, and abnormal dendritic morphogenesis. Loss of KDM5A also resulted in dysregulation of the hippocampal transcriptome. To determine if KDM5A mutations cause ASD in humans, we screened whole exome sequencing and microarray data from a clinical cohort. We identified pathogenic KDM5A variants in nine patients with ASD and lack of speech. Our findings illustrate the power and efficacy of forward genetics in identifying ASD genes and highlight the importance of KDM5A in normal brain development and function.
Understanding effects of recent climate warming and changes in catchment conditions on nutrient cycling and the biology of shallow subarctic lakes is necessary to predict their evolution. Here, we use multiple analytical methods on sediment cores to identify effects of change in catchment conditions on nutrient availability and biotic assemblages in two subarctic lakes on the Seward Peninsula (Alaska, USA). We compare limnological and biotic responses to flooding and expansion of a thermokarst lake basin (late 1950s), increased shrub growth in the catchment of another lake (since the mid-1980s), and regional warming (since the late 1970s). Among these three environmental drivers, the largest biotic responses occurred because of flooding and expansion of the thermokarst lake. An increase in the nitrogen isotope composition and decline in organic carbon isotope composition in sediments are interpreted to reflect an elevated supply of dissolved inorganic carbon and nitrogen. This was associated with significant shifts in composition of chironomid and diatom assemblages. In contrast, increases in particulate organic carbon and nitrogen from enhanced shrub growth had less influence on the biota. Declines in cold-water biotic indicators typical of warming lakes in Arctic regions occurred several decades after catchment-induced changes to the nutrient supply in both systems. This indicates that initial lake catchment condition may mediate lake-specific changes in nutrient cycling and aquatic productivity within regions undergoing warming.
Background Screening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders. We aimed to develop an efficient computational workflow for reliable detection of STR expansions in next-generation sequencing data and demonstrate its clinical utility. Methods We characterized the performance of eight STR analysis methods (lobSTR, HipSTR, RepeatSeq, ExpansionHunter, TREDPARSE, GangSTR, STRetch, and exSTRa) on next-generation sequencing datasets of samples with known disease-causing full-mutation STR expansions and genomes simulated to harbor repeat expansions at selected loci and optimized their sensitivity. We then used a machine learning decision tree classifier to identify an optimal combination of methods for full-mutation detection. In Burrows-Wheeler Aligner (BWA)-aligned genomes, the ensemble approach of using ExpansionHunter, STRetch, and exSTRa performed the best (precision = 82%, recall = 100%, F1-score = 90%). We applied this pipeline to screen 301 families of children with suspected genetic disorders. Results We identified 10 individuals with full-mutations in the AR, ATXN1, ATXN8, DMPK, FXN, or HTT disease STR locus in the analyzed families. Additional candidates identified in our analysis include two probands with borderline ATXN2 expansions between the established repeat size range for reduced-penetrance and full-penetrance full-mutation and seven individuals with FMR1 CGG repeats in the intermediate/premutation repeat size range. In 67 probands with a prior negative clinical PCR test for the FMR1, FXN, or DMPK disease STR locus, or the spinocerebellar ataxia disease STR panel, our pipeline did not falsely identify aberrant expansion. We performed clinical PCR tests on seven (out of 10) full-mutation samples identified by our pipeline and confirmed the expansion status in all, showing absolute concordance between our bioinformatics and molecular findings. Conclusions We have successfully demonstrated the application of a well-optimized bioinformatics pipeline that promotes the utility of genome-wide sequencing as a first-tier screening test to detect expansions of known disease STRs. Interrogating clinical next-generation sequencing data for pathogenic STR expansions using our ensemble pipeline can improve diagnostic yield and enhance clinical outcomes for patients with repeat expansion disorders.
The factors driving initiation of pathological expansion of tandem repeats remain largely unknown. Here, we assessed theFGF14-SCA27B (GAA)·(TTC) repeat locus in 2,530 individuals by long-read and Sanger sequencing and identified a 5′-flanking 17-bp deletion-insertion in 70.34% of alleles (3,463/4,923). This common sequence variation was present nearly exclusively on alleles with fewer than 30 GAA-pure repeats and was associated with enhanced meiotic stability of the repeat locus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.