Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution 1,2 . Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes 3,4 . The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer.To gain insights into the molecular alterations that cause CLL, we performed whole-genome sequencing of four cases representative of different forms of the disease: two cases, CLL1 and CLL2, with no mutations in the immunoglobulin genes (IGHV-unmutated) and two cases, CLL3 and CLL4, with mutations in these genes (IGHV-mutated) (Supplementary Table 1 and Supplementary Information). We used a combination of whole-genome sequencing and exome sequencing, as well as long-insert paired-end libraries, to detect variants in chromosomal structure (Supplementary Fig. 1 and Supplementary Tables 2-5). We obtained more than 99.7% concordance between whole-genome sequencing calls and genotyping data, indicating that the coverage and parameters used were sufficient to detect most of the sequence variants in these samples (Supplementary Information). We detected about 1,000 somatic mutations per tumour in non-repetitive regions (Fig. 1a, Supplementary Fig. 2 and Supplementary Table 6). These numbers of somatic mutations were lower than the numbers in melanoma and lung carcinoma 5,6 , but in agreement with previous estimates of less than one mutation per megabase (Mb) for leukaemias 7 . The most common substitution was the transition G>A/C>T, usually occurring in a CpG context (Fig. 1b and Supplementary Fig. 2). We also detected marked differences in the mutation pattern between CLL samples and these differences were associated with tumour subtype (Fig. 1b). Thus, IGHV-mutated cases showed a higher proportion of A>C/T>G mutations tha...
The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.
Human aging cannot be fully understood in terms of the constrained genetic setting. Epigenetic drift is an alternative means of explaining age-associated alterations. To address this issue, we performed whole-genome bisulfite sequencing (WGBS) of newborn and centenarian genomes. The centenarian DNA had a lower DNA methylation content and a reduced correlation in the methylation status of neighboring cytosine-phosphate-guanine (CpGs) throughout the genome in comparison with the more homogeneously methylated newborn DNA. The more hypomethylated CpGs observed in the centenarian DNA compared with the neonate covered all genomic compartments, such as promoters, exonic, intronic, and intergenic regions. For regulatory regions, the most hypomethylated sequences in the centenarian DNA were present mainly at CpG-poor promoters and in tissue-specific genes, whereas a greater level of DNA methylation was observed in CpG island promoters. We extended the study to a larger cohort of newborn and nonagenarian samples using a 450,000 CpG-site DNA methylation microarray that reinforced the observation of more hypomethylated DNA sequences in the advanced age group. WGBS and 450,000 analyses of middle-age individuals demonstrated DNA methylomes in the crossroad between the newborn and the nonagenarian/centenarian groups. Our study constitutes a unique DNA methylation analysis of the extreme points of human life at a single-nucleotide resolution level.epigenomics | longevity D uring human aging, progressive impairment of organ and tissue functionality leads to an increasing probability of death. The molecular culprits behind this decline in physiological activities remain largely unknown. Studies of transcriptional and genomic associations in distinct tissues have identified several gene families and cellular pathways that might contribute to aging and alter lifespan. These families include the Sirtuins, DNA repair enzymes, insulin-signaling pathway/forkhead transcription factors, apolipoproteins, telomere biology, and oxidative damage/ mitochondrial metabolism (1, 2). Aging-associated mechanisms apparently involve many networks within a given cell. Considering that epigenetic regulation has emerged as a critical driver of cell fate and survival that targets many pathways (3, 4), that epigenetic drift can occur even in genetically identical humans (5, 6), and that DNA methylation patterns are disrupted in a wide range of common human diseases (7-11), we wondered whether individuals at the most extreme points of their lifespan had different DNA methylomes. To address this issue, we used whole-genome bisulfite sequencing (WGBS) (12-16) and a 450,000 CpG DNA methylation microarray to examine the DNA methylation profiles of newborn and nonagenarian/centenarian samples.Results and Discussion WGBS of Newborn and Centenarian DNA. The initial data were generated from the cord blood of a newborn (male Caucasian; NB) and from a centenarian (103-y-old male Caucasian; Y103) using DNA extracted from CD4 + T cells processed through an Illumina G...
The maintenance of protein function and structure constrains the evolution of amino acid sequences. This fact can be exploited to interpret correlated mutations observed in a sequence family as an indication of probable physical contact in three dimensions. Here we present a simple and general method to analyze correlations in mutational behavior between different positions in a multiple sequence alignment. We then use these correlations to predict contact maps for each of 11 protein families and compare the result with the contacts determined by crystallography. For the most strongly correlated residue pairs predicted to be in contact, the prediction accuracy ranges from 37 to 68% and the improvement ratio relative to a random prediction from 1.4 to 5.1. Predicted contact maps can be used as input for the calculation of protein tertiary structure, either from sequence information alone or in combination with experimental information.
An ATPase domain common to prokaryotic cell cycle proteins, sugar kinases, actin, and hsp7O heat shock proteins (structural ABSTRACTThe functionally diverse actin, hexokinase, and hsp7O protein families have in common an ATPase domain of known three-dimensional structure. Optimal superposition of the three structures and alignment ofmany sequences in each of the three families has revealed a set of common conserved residues, distributed in five sequence motifs, which are involved in ATP binding and in a putative interdomain hinge. From the multiple sequence aliment in these motifs a pattern of amino acid properties required at each position is defined. The discriminatory power of the pattern is in part due to the use of several known three-dimensional structures and many sequences and in part to the "property" method ofgeneralizing from observed amino acid frequencies to amino acid fitness at each sequence position. A sequence data base search with the pattern significantly matches sugar kinases, such as fuco-, glucono-, xylulo-, ribulo-, and glycerokinase, as well as the prokaryotic cell cycle proteins MreB, FtsA, and StbA. These are predicted to have subdomains with the same tertiary structure as the ATPase subdomains Ia and Ha of hexokinase, actin, and Hsc7O, a very similar ATP binding pocket, and the capacity for interdomain hinge motion accompanying functional state changes. A common evolutionary origin for all of the proteins in this class is proposed.
We have extensively characterized the DNA methylomes of 139 patients with chronic lymphocytic leukemia (CLL) with mutated or unmutated IGHV and of several mature B-cell subpopulations through the use of whole-genome bisulfite sequencing and high-density microarrays. The two molecular subtypes of CLL have differing DNA methylomes that seem to represent epigenetic imprints from distinct normal B-cell subpopulations. DNA hypomethylation in the gene body, targeting mostly enhancer sites, was the most frequent difference between naive and memory B cells and between the two molecular subtypes of CLL and normal B cells. Although DNA methylation and gene expression were poorly correlated, we identified gene-body CpG dinucleotides whose methylation was positively or negatively associated with expression. We have also recognized a DNA methylation signature that distinguishes new clinico-biological subtypes of CLL. We propose an epigenomic scenario in which differential methylation in the gene body may have functional and clinical implications in leukemogenesis.
Co-evolution is a fundamental component of the theory of evolution and is essential for understanding the relationships between species in complex ecological networks. A wide range of co-evolution-inspired computational methods has been designed to predict molecular interactions, but it is only recently that important advances have been made. Breakthroughs in the handling of phylogenetic information and in disentangling indirect relationships have resulted in an improved capacity to predict interactions between proteins and contacts between different protein residues. Here, we review the main co-evolution-based computational approaches, their theoretical basis, potential applications and foreseeable developments.
Deciphering the network of protein interactions that underlines cellular operations has become one of the main tasks of proteomics and computational biology. Recently, a set of bioinformatics approaches has emerged for the prediction of possible interactions by combining sequence and genomic information. Even though the initial results are very promising, the current methods are still far from perfect. We propose here a new way of discovering possible protein-protein interactions based on the comparison of the evolutionary distances between the sequences of the associated protein families, an idea based on previous observations of correspondence between the phylogenetic trees of associated proteins in systems such as ligands and receptors. Here, we extend the approach to different test sets, including the statistical evaluation of their capacity to predict protein interactions. To demonstrate the possibilities of the system to perform large-scale predictions of interactions, we present the application to a collection of more than 67 000 pairs of E.coli proteins, of which 2742 are predicted to correspond to interacting proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.