“Orangutan” is derived from the Malay term “man of the forest” and aptly describes the Southeast Asian great apes native to Sumatra and Borneo. The orangutan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orangutan draft genome assembly and short read sequence data from five Sumatran and five Bornean orangutan genomes. Our analyses reveal that, compared to other primates, the orangutan genome has many unique features. Structural evolution of the orangutan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe the first primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orangutan genome structure. Orangutans have extremely low energy usage for a eutherian mammal1, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400k years ago (ya), is more recent than most previous studies and underscores the complexity of the orangutan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (Ne) expanded exponentially relative to the ancestral Ne after the split, while Bornean Ne declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.
A r t i c l e sThe dog tapeworm E. granulosus is one of a group of medically important parasitic helminths of the family Taeniidae (Platyhelminthes; Cestoda; Cyclophyllidea) that infect at least 50 million people globally 1 . Its life cycle involves two mammals, including an intermediate host, usually a domestic or wild ungulate (humans are accidental intermediate hosts) and a canine-definitive host, such as the domestic dog. The larval (metacestode) stage causes hydatidosis (cystic hydatid disease; cystic echinococcosis), a chronic cyst-forming disease in the intermediate (human) host. Currently, up to 3 million people are infected with E. granulosus 2,3 , and, in some areas, 10% of the population has detectable hydatid cysts by abdominal ultrasound and chest X-ray 4,5 .E. granulosus has no gut, circulatory or respiratory organs. It is monoecious, producing diploid eggs that give rise to ovoid embryos, the oncospheres. Strobilization is a notable feature of cestode biology, whereby proglottids bud distally from the anterior scolex, resulting in the production of tandem reproductive units exhibiting increasing degrees of development. A unique characteristic of the larvae (protoscoleces, PSCs) within the hydatid cyst is an ability to develop bidirectionally into an adult worm in the dog gastrointestinal tract or into a secondary hydatid cyst in the intermediate (human) host, a process triggered by bile acids 6 . Another distinct feature of E. granulosus is its capacity to infect and adapt to a large number of mammalian species as intermediate hosts, which has contributed to its cosmopolitan global distribution.Here we report the sequence and analysis of the E. granulosus genome. Comprising nine pairs of chromosomes 7 , it is one of the first cestode genomes to be sequenced and complements the recent publication by Tsai et al. 8 of a high-quality genome for Echinococcus multilocularis (the cause of alveolar echinococcosis), together with draft genomes of three other tapeworm species including E. granulosus. Our study provides insights into the biology, development, differentiation, evolution and host interaction of E. granulosus and has identified a range of drug and vaccine targets that can facilitate the development of new intervention tools for hydatid treatment and control. Cystic echinococcosis (hydatid disease), caused by the tapeworm E. granulosus, is responsible for considerable human morbidity and mortality. This cosmopolitan disease is difficult to diagnose, treat and control. We present a draft genomic sequence for the worm comprising 151.6 Mb encoding 11,325 genes. Comparisons with the genome sequences from other taxa show that E. granulosus has acquired a spectrum of genes, including the EgAgB family, whose products are secreted by the parasite to interact and redirect host immune responses. We also find that genes in bile salt pathways may control the bidirectional development of E. granulosus, and sequence differences in the calcium channel subunit EgCa v b 1 may be associated with praziquantel sens...
The MinION device by Oxford Nanopore produces very long reads (reads over 100 kBp were reported); however it suffers from high sequencing error rate. We present an open-source DNA base caller based on deep recurrent neural networks and show that the accuracy of base calling is much dependent on the underlying software and can be improved by considering modern machine learning methods. By employing carefully crafted recurrent neural networks, our tool significantly improves base calling accuracy on data from R7.3 version of the platform compared to the default base caller supplied by the manufacturer. On R9 version, we achieve results comparable to Nanonet base caller provided by Oxford Nanopore. Availability of an open source tool with high base calling accuracy will be useful for development of new applications of the MinION device, including infectious disease detection and custom target enrichment during sequencing.
A first analysis of the genome sequence of the common marmoset (Callithrix jacchus), assembled using traditional Sanger methods and Ensembl annotation, has permitted genomic comparison with apes and that old world monkeys and the identification of specific molecular features a rapid reproductive capacity partly due to may contribute to the unique biology of diminutive The common marmoset has prevalence of this dizygotic primate. twins. Remarkably, these twins share placental circulation and exchange hematopoietic stem cells in utero, resulting in adults that are hematopoietic chimeras. We observed positive selection or non-synonymous substitutions for genes encoding growth hormone / insulin-like growth factor (growth pathways), respiratory complex I (metabolic pathways), immunobiology, and proteases (reproductive and immunity pathways). In addition, both protein-coding and microRNA genes related to reproduction exhibit rapid sequence evolution. This New World monkey genome sequence enables significantly increased power for comparative analyses among available primate genomes and facilitates biomedical research application.
The MGC Project Team 1Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.
Mitochondrial genome diversity in closely related species provides an excellent platform for investigation of chromosome architecture and its evolution by means of comparative genomics. In this study, we determined the complete mitochondrial DNA sequences of eight Candida species and analyzed their molecular architectures. Our survey revealed a puzzling variability of genome architecture, including circular- and linear-mapping and multipartite linear forms. We propose that the arrangement of large inverted repeats identified in these genomes plays a crucial role in alterations of their molecular architectures. In specific arrangements, the inverted repeats appear to function as resolution elements, allowing genome conversion among different topologies, eventually leading to genome fragmentation into multiple linear DNA molecules. We suggest that molecular transactions generating linear mitochondrial DNA molecules with defined telomeric structures may parallel the evolutionary emergence of linear chromosomes and multipartite genomes in general and may provide clues for the origin of telomeres and pathways implicated in their maintenance.
Optimal spaced seeds were developed as a method to increase sensitivity of local alignment programs similar to BLASTN. Such seeds have been used before in the program PatternHunter, and have given improved sensitivity and running time relative to BLASTN in genome-genome comparison. We study the problem of computing optimal spaced seeds for detecting homologous coding regions in unannotated genomic sequences. By using well-chosen seeds, we are able to improve the sensitivity of coding sequence alignment over that of TBLASTX, while keeping runtime comparable to BLASTN. We identify good seeds by first giving effective hidden Markov models of conservation in alignments of homologous coding regions. We give an efficient algorithm to compute the optimal spaced seed when conservation patterns are generated by these models. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general, and may have applications outside of bioinformatics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.