Background: Diseases of the nervous system are widely considered to be caused by genetic mutations, and they have been shown to share pathogenic genes. Discovering the shared mechanisms of these diseases is useful for designing common treatments. Method: In this study, by reviewing 518 articles published after 2007 on 20 diseases of the nervous system, we compiled data on 1607 mutations occurring in 365 genes, totals that are 1.9 and 3.2 times larger than those collected in the Clinvar database, respectively. A combination with the Clinvar data gives 2434 pathogenic mutations and 424 genes. Using this information, we measured the genetic similarities between the diseases according to the number of genes causing two diseases simultaneously. Further detection was carried out on the similarity between diseases in terms of cell types. Disease-related cell types were defined as those with disease-related gene enrichment among the marker genes of cells, as ascertained by analyzing single-cell sequencing data. Enrichment profiles of the disease-related genes over 25 cell types were constructed. The disease similarity in terms of cell types was obtained by calculating the distances between the enrichment profiles of these genes. The same strategy was applied to measure the disease similarity in terms of brain regions by analyzing the gene expression data from 10 brain regions. Results: The disease similarity was first measured in terms of genes. The result indicated that the proportions of overlapped genes between diseases were significantly correlated to the DMN scores (phenotypic similarity), with a Pearson correlation coefficient of 0.40 and P-value = 6.0×10-3. The disease similarity analysis for cell types identified that the distances between enrichment profiles of the disease-related genes were negatively correlated to the DMN scores, with Spearman correlation coefficient = -0.26 (P-value = 1.5 × 10-2). However, the brain region enrichment profile distances of the disease-related genes were not significantly correlated with the DMN score. Besides the similarity of diseases, this study identified novel relationships between diseases and cell types. Conclusion: We manually constructed the most comprehensive dataset to date for genes with mutations related to 20 nervous system diseases. By using this dataset, the similarities between diseases in terms of genes and cell types were found to be significantly correlated to their phenotypic similarity. However, the disease similarities in terms of brain regions were not significantly correlated with the phenotypic similarities. Thus, the phenotypic similarity between the diseases is more likely to be caused by dysfunctions of the same genes or the same types of neurons rather than the same brain regions. The data are collected into the database NeurodisM, which is available at .
Schizophrenia (SCZ) is a polygenic disease with a heritability approaching 80%. Over 100 SCZ-related loci have so far been identified by genome-wide association studies (GWAS). However, the risk genes associated with these loci often remain unknown. We present a new risk gene predictor, rGAT-omics, that integrates multi-omics data under a Bayesian framework by combining the Hotelling and Box–Cox transformations. The Bayesian framework was constructed using gene ontology, tissue-specific protein–protein networks, and multi-omics data including differentially expressed genes in SCZ and controls, distance from genes to the index single-nucleotide polymorphisms (SNPs), and de novo mutations. The application of rGAT-omics to the 108 loci identified by a recent GWAS study of SCZ predicted 103 high-risk genes (HRGs) that explain a high proportion of SCZ heritability (Enrichment = 43.44 and $$p = 9.30 \times 10^{ - 9}$$ p = 9.30 × 1 0 − 9 ). HRGs were shown to be significantly ($$p_{\mathrm{adj}} = 5.35 \times 10^{ - 7}$$ p adj = 5.35 × 1 0 − 7 ) enriched in genes associated with neurological activities, and more likely to be expressed in brain tissues and SCZ-associated cell types than background genes. The predicted HRGs included 16 novel genes not present in any existing databases of SCZ-associated genes or previously predicted to be SCZ risk genes by any other method. More importantly, 13 of these 16 genes were not the nearest to the index SNP markers, and them would have been difficult to identify as risk genes by conventional approaches while ten out of the 16 genes are associated with neurological functions that make them prime candidates for pathological involvement in SCZ. Therefore, rGAT-omics has revealed novel insights into the molecular mechanisms underlying SCZ and could provide potential clues to future therapies.
Microdeletions and gross deletions are important causes (~20%) of human inherited disease. Their genomic locations are strongly influenced by the local DNA sequence environment. Yet no systematic study has examined the generative mechanisms. Here, we obtained 42,098 pathogenic microdeletions and gross deletions from the Human Gene Mutation Database (HGMD) that together form a continuum of germline deletions ranging in size from 1 bp to 28,394,429 bp. We analyzed the sequence within 1-kb of the breakpoint junctions and found the frequencies of non-B DNA-forming repeats, GC content, and the presence of seven of 78 specific sequence motifs in the vicinity of pathogenic deletions correlated with deletion length for deletions of length ≤30 bp. Furthermore, we found the repeats of DR, GQ, and STR appear to be important for the formation of longer deletions (>30 bp) but not for the formation of shorter deletions (≤30 bp) and significantly (Chi-square test P-value < 2E-16) more microhomologies were identified in flanking short deletions than long deletions (length >30 bp). We provide evidence to support a functional distinction between microdeletions and gross deletions. A deletion length cut-off of 25-30 bp may serve as an objective means to functionally distinguish microdeletions from gross deletions.
Microdeletions and gross deletions are important causes (~20%) of human inherited disease and their genomic locations are strongly influenced by the local DNA sequence environment. This notwithstanding, no study has systematically examined their underlying generative mechanisms. Here, we obtained 42,098 pathogenic microdeletions and gross deletions from the Human Gene Mutation Database (HGMD) that together form a continuum of germline deletions ranging in size from 1 to 28,394,429 bp. We analyzed the DNA sequence within 1 kb of the breakpoint junctions and found that the frequencies of non-B DNA-forming repeats, GC-content, and the presence of seven of 78 specific sequence motifs in the vicinity of pathogenic deletions correlated with deletion length for deletions of length ≤30 bp.Further, we found that the presence of DR, GQ, and STR repeats is important for the formation of longer deletions (>30 bp) but not for the formation of shorter deletions (≤30 bp) while significantly (χ 2 , p < 2E−16) more microhomologies were identified flanking short deletions than long deletions (length >30 bp). We provide evidence to support a functional distinction between microdeletions and gross deletions. Finally, we propose that a deletion length cut-off of 25-30 bp may serve as an objective means to functionally distinguish microdeletions from gross deletions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.