Intellectual disability (ID) is a clinical sign reflecting diverse neurodevelopmental disorders that are genetically and phenotypically heterogeneous. Just recently, partial or complete deletion of methyl-CpG-binding domain 5 (MBD5) gene has been implicated as causative in the phenotype associated with 2q23.1 microdeletion syndrome. In the course of systematic whole-genome screening of individuals with unexplained ID by array-based comparative genomic hybridization, we identified de novo intragenic deletions of MBD5 in three patients leading, as previously documented, to haploinsufficiency of MBD5. In addition, we described a patient with an unreported de novo MBD5 intragenic duplication. Reverse transcriptase-PCR and sequencing analyses showed the presence of numerous aberrant transcripts leading to premature termination codon. To further elucidate the involvement of MBD5 in ID, we sequenced ten coding, five non-coding exons and an evolutionary conserved region in intron 2, in a selected cohort of 78 subjects with a phenotype reminiscent of 2q23.1 microdeletion syndrome. Besides variants most often inherited from an healthy parent, we identified for the first time a de novo nonsense mutation associated with a much more damaging phenotype. Taken together, these results extend the mutation spectrum in MBD5 gene and contribute to refine the associated phenotype of neurodevelopmental disorder. Keywords: MBD5; nonsense mutation; intragenic duplication; intellectual disability INTRODUCTION Methyl-CpG-binding domain 5 (MBD5) protein (OMIM *611472) is a member of the MBD protein family in which MECP2 (OMIM *300005) is involved in Rett syndrome, a prototypical neurodevelopmental disorder. MBD5 contains five non-coding exons at its 5 0 -end, followed by 10 coding exons. Two isoforms have been described, 1 the longer one contains 1494 amino acids and is encoded by exons 6-15, the second one contains 851 amino acids and is encoded by exons 6-9. Functional studies suggested that MBD5 is likely to contribute to the formation or function of heterochromatin. 1 Isoform 1 of MBD5 is highly expressed in brain and testis and isoform 2 is highly expressed in oocytes, which suggest a possible role in cerebral functions and in epigenetic reprogramming after fertilization. Recently, deletions encompassing MBD5, as well as intragenic MBD5 deletions have been identified in individuals with a phenotype of intellectual disability (ID), seizures, significant speech impairment, and behavioral problems. [2][3][4][5][6][7][8] In this study, we used pangenomic arraycomparative genomic hybridization (array-CGH) and capillary sequencing of MBD5 gene to investigate DNAs from patients with unexplained ID. We further extend the mutational spectrum of MBD5 with damaging intragenic duplication and nonsense mutation associated with a clinical spectrum of neurodevelopmental disorder.
The choice of the most appropriate unsupervised machine-learning method for “heterogeneous” or “mixed” data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of “ready-to-use” tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.