Brazilians are highly admixed with ancestry from Europe, Africa, America, and Asia and yet still underrepresented in genomic databanks. We hereby present a collection of exomic variants from 609 elderly Brazilians in a census-based cohort (SABE609) with comprehensive phenotyping. Variants were deposited in ABraOM (Online Archive of Brazilian Mutations), a Web-based public database. Population representative phenotype and genotype repositories are essential for variant interpretation through allele frequency filtering; since elderly individuals are less likely to harbor pathogenic mutations for early- and adult-onset diseases, such variant databases are of great interest. Among the over 2.3 million variants from the present cohort, 1,282,008 were high-confidence calls. Importantly, 207,621 variants were absent from major public databases. We found 9,791 potential loss-of-function variants with about 300 mutations per individual. Pathogenic variants on clinically relevant genes (ACMG) were observed in 1.15% of the individuals and were correlated with clinical phenotype. We conducted incidence estimation for prevalent recessive disorders based upon heterozygous frequency and concluded that it relies on appropriate pathogenicity assertion. These observations illustrate the relevance of collecting demographic data from diverse, poorly characterized populations. Census-based datasets of aged individuals with comprehensive phenotyping are an invaluable resource toward the improved understanding of variant pathogenicity.
This work reports the development of GenSeed-HMM, a program that implements seed-driven progressive assembly, an approach to reconstruct specific sequences from unassembled data, starting from short nucleotide or protein seed sequences or profile Hidden Markov Models (HMM). The program can use any one of a number of sequence assemblers. Assembly is performed in multiple steps and relatively few reads are used in each cycle, consequently the program demands low computational resources. As a proof-of-concept and to demonstrate the power of HMM-driven progressive assemblies, GenSeed-HMM was applied to metagenomic datasets in the search for diverse ssDNA bacteriophages from the recently described Alpavirinae subfamily. Profile HMMs were built using Alpavirinae-specific regions from multiple sequence alignments (MSA) using either the viral protein 1 (VP1; major capsid protein) or VP4 (genome replication initiation protein). These profile HMMs were used by GenSeed-HMM (running Newbler assembler) as seeds to reconstruct viral genomes from sequencing datasets of human fecal samples. All contigs obtained were annotated and taxonomically classified using similarity searches and phylogenetic analyses. The most specific profile HMM seed enabled the reconstruction of 45 partial or complete Alpavirinae genomic sequences. A comparison with conventional (global) assembly of the same original dataset, using Newbler in a standalone execution, revealed that GenSeed-HMM outperformed global genomic assembly in several metrics employed. This approach is capable of detecting organisms that have not been used in the construction of the profile HMM, which opens up the possibility of diagnosing novel viruses, without previous specific information, constituting a de novo diagnosis. Additional applications include, but are not limited to, the specific assembly of extrachromosomal elements such as plastid and mitochondrial genomes from metagenomic data. Profile HMM seeds can also be used to reconstruct specific protein coding genes for gene diversity studies, and to determine all possible gene variants present in a metagenomic sample. Such surveys could be useful to detect the emergence of drug-resistance variants in sensitive environments such as hospitals and animal production facilities, where antibiotics are regularly used. Finally, GenSeed-HMM can be used as an adjunct for gap closure on assembly finishing projects, by using multiple contig ends as anchored seeds.
DNA cytosine methylation is central to many biological processes, including regulation of gene expression, cellular differentiation and development. This DNA modification is conserved across animals, having been found in representatives of sponges, ctenophores, cnidarians and bilaterians, and with very few known instances of secondary loss in animals. Myxozoans are a group of microscopic, obligate endoparasitic cnidarians that have lost many genes over the course of their evolution from free-living ancestors. Here, we investigated the evolution of the key enzymes involved in DNA cytosine methylation in 29 cnidarians, and found that these enzymes were lost in an ancestor of Myxosporea (the most speciose class of Myxozoa). Additionally, using whole genome bisulfite sequencing (WGBS), we confirmed that the genomes of two distant species of myxosporeans, Ceratonova shasta and Henneguya salminicola, completely lack DNA cytosine methylation. Our results add a notable and novel taxonomic group, the Myxosporea, to the very short list of animal taxa lacking DNA cytosine methylation, further illuminating the complex evolutionary history of this epigenetic regulatory mechanism.
Pancreatic β-cells, residents of the islets of Langerhans, are the unique insulin-producers in the body. Their physiology is a topic of intensive studies aiming to understand the biology of insulin production and its role in diabetes pathology. However, investigations about these cells’ subset of secreted proteins, the secretome, are surprisingly scarce and a list describing islet/β-cell secretome upon glucose-stimulation is not yet available. In silico predictions of secretomes are an interesting approach that can be employed to forecast proteins likely to be secreted. In this context, using the rationale behind classical secretion of proteins through the secretory pathway, a Python tool capable of predicting classically secreted proteins was developed. This tool was applied to different available proteomic data (human and rodent islets, isolated β-cells, β-cell secretory granules, and β-cells supernatant), filtering them in order to selectively list only classically secreted proteins. The method presented here can retrieve, organize, search and filter proteomic lists using UniProtKB as a central database. It provides analysis by overlaying different sets of information, filtering out potential contaminants and clustering the identified proteins into functional groups. A range of 70–92% of the original proteomes analyzed was reduced generating predicted secretomes. Islet and β-cell signal peptide-containing proteins, and endoplasmic reticulum-resident proteins were identified and quantified. From the predicted secretomes, exemplary conservational patterns were inferred, as well as the signaling pathways enriched within them. Such a technique proves to be an effective approach to reduce the horizon of plausible targets for drug development or biomarkers identification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.