Our hypothesis is that machine learning (ML) analysis of whole exome sequencing (WES) data can be used to identify individuals at high risk for schizophrenia (SCZ). This study applies ML to WES data from 2,545 individuals with SCZ and 2,545 unaffected individuals, accessed via the database of genotypes and phenotypes (dbGaP). Single nucleotide variants and small insertions and deletions were annotated by ANNOVAR using the reference genome hg19/GRCh37. Rare (predicted functional) variants with a minor allele frequency ≤1% and genotype quality ≥90 including missense, frameshift, stop gain, stop loss, intronic, and exonic splicing variants were selected. A file containing all cases and controls, the names of genes with variants meeting our criteria, and the number of variants per gene for each individual, was used for ML analysis. The supervised machine-learning algorithm used the patterns of variants observed in the different genes to determine which subset of genes can best predict that an individual is affected. Seventy percent of the data was used to train the algorithm and the remaining 30% of data (n = 1,526) was used to evaluate its efficiency. The supervised ML algorithm, gradient boosted trees with regularization (eXtreme Gradient Boosting implementation) was the best performing algorithm yielding promising results (accuracy: 85.7%, specificity: 86.6%, sensitivity: 84.9%, area under the receiver-operator characteristic curve: 0.95). The top 50 features (genes) of the algorithm were analyzed using bioinformatics resources for new insights about the pathophysiology of SCZ. This manuscript presents a novel predictor which could potentially enable studies exploring disease-modifying intervention in the early stages of the disease.
A literature review was conducted, using the computerized "Online Mendelian Inheritance in Man" (OMIM) and PubMed, to identify inborn errors of metabolism (IEM) in which psychosis may be a predominant feature or the initial presenting symptom. Different combinations of the following keywords were searched using OMIM: "psychosis", "schizophrenia", or "hallucinations" and "metabolic", "inborn error of metabolism", "inborn errors of metabolism", "biochemical genetics", or "metabolic genetics". The OMIM search generated 126 OMIM entries, 40 of which were well known IEM. After removing IEM lacking evidence in PubMed for an association with psychosis, 29 OMIM entries were identified. Several of these IEM are treatable. They involve different small organelles (lysosomes, peroxisomes, mitochondria), iron or copper accumulation, as well as defects in other met-abolic pathways (e.g., defects leading to hyperammonemia or homocystinemia). A clinical checklist summarizing the key features of these conditions and a guide to clinical approach are provided. The genes corresponding to each of these con-ditions were identified. Whole exome data from 2545 adult cases with schizophrenia and 2545 unrelated controls, accessed via the Database of Genotypes and Phenotypes (dbGaP), were analyzed for rare functional variants in these genes. The odds ratio of having a rare functional variant in cases versus controls was calculated for each gene. Eight genes are significantly associated with schizophrenia (p < 0.05, OR >1) using an unselected group of adult patients with schizophrenia. Increased awareness of clinical clues for these IEM will optimize referrals and timely metabolic interventions.
Autism spectrum disorder (ASD) is a neurobehavioral disorder with a heterogeneous genetic etiology. Based on the literature, several single-gene disorders, including Rett syndrome, Smith-Lemli-Opitz syndrome, PTEN hamartoma tumor syndrome and tuberous sclerosis, are associated with a high prevalence of ASD. We estimated the prevalence of these four conditions in a large cohort of patients using whole-exome sequencing data from 2392 families (1800 quads and 592 trios) with ASD from the National Database for Autism Research. Seven patients carried a pathogenic or likely pathogenic variant in either TSC1, TSC2, PTEN, DHCR7 or MECP2, with 6 out of 7 reportable variants occurring in PTEN (1 in 399).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.