Anup Madan scite author profile

We describe the third generation of the CAP sequence assembly program. The CAP3 program includes a number of improvements and new features. The program has a capability to clip 5Ј and 3Ј low-quality regions of reads. It uses base quality values in computation of overlaps between reads, construction of multiple sequence alignments of reads, and generation of consensus sequences. The program also uses forward-reverse constraints to correct assembly errors and link contigs. Results of CAP3 on four BAC data sets are presented. The performance of CAP3 was compared with that of PHRAP on a number of BAC data sets. PHRAP often produces longer contigs than CAP3 whereas CAP3 often produces fewer errors in consensus sequences than PHRAP. It is easier to construct scaffolds with CAP3 than with PHRAP on low-pass data with forward-reverse constraints.The shotgun sequencing strategy has been used widely in genome sequencing projects. A major phase in this strategy is to assemble short reads into long sequences. A number of DNA sequence assembly programs have been developed (Staden 1980;Peltola et al. 1984;Huang 1992;Smith et al. 1993;Gleizes and Henaut 1994;Lawrence et al. 1994;Kececioglu and Myers 1995;Sutton et al. 1995;Green 1996). The FAKII program provides a library of routines for each phase of the assembly process (Larson et al. 1996). The GAP4 program has a number of useful interactive features (Bonfield et al. 1995). The PHRAP program clips 5Ј and 3Ј low-quality regions of reads and uses base quality values in evaluation of overlaps and generation of contig sequences (Green 1996). TIGR Assembler has been used in a number of megabase microbial genome projects (Sutton et al. 1995). Continued development and improvement of sequence assembly programs are required to meet the challenges of the human, mouse, and maize genome projects.We have developed the third generation of the CAP sequence assembly program (Huang 1992). The CAP3 program includes a number of improvements and new features. A capability to clip 5Ј and 3Ј lowquality regions of reads is included in the CAP3 program. Base quality values produced by PHRED ) are used in computation of overlaps between reads, construction of multiple sequence alignments of reads, and generation of consensus sequences. Efficient algorithms are employed to identify and compute overlaps between reads. Forward-reverse constraints are used to correct assembly errors and link contigs. Results of CAP3 on four BAC data sets are presented. The performance of CAP3 was compared with that of PHRAP on a number of BAC data sets. PHRAP often produces longer contigs than CAP3 whereas CAP3 often produces fewer errors in consensus sequences than PHRAP. It is easier to construct scaffolds with CAP3 than with PHRAP on low-pass data with forward-reverse constraints.An unusual feature of CAP3 is the use of forwardreverse constraints in the construction of contigs. A forward-reverse constraint is often produced by sequencing of both ends of a subclone. A forward-reverse constraint specifies that the two...

show abstract

Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences

Strausberg¹,

Feingold²,

Grouse³

et al. 2002

Proc. Natl. Acad. Sci. U.S.A.

1,526

327

View full text Add to dashboard Cite

The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene. ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones, which then were sequenced to high accuracy. The MGC has currently sequenced and verified the full ORF for a nonredundant set of >9,000 human and >6,000 mouse genes. Candidate full-ORF clones for an additional 7,800 human and 3,500 mouse genes also have been identified. All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http:͞͞mgc.nci.nih.gov).T he gene content of the mammalian genome is a topic of great interest. While draft sequences are now available for the human (1, 2), mouse (www.ensembl.org͞Mus musculus), and rat (http:͞͞hgsc.bcm.tmc.edu͞projects͞rat) genomes, the challenge remains to correctly identify all of the encoded genes. Difficulty in deciphering the anatomy of mammalian genes is due to several factors, including large amounts of intervening (noncoding) sequence, the imperfection of gene-prediction algorithms (3), and the incompleteness of cDNA-sequence resources, many of which consist of gene tags of variable length and quality. Full-length cDNA sequences are extremely useful for determining the genomic structure of genes, especially when analyzed within the context of genomic sequence. To facilitate geneidentification efforts and to catalyze experimental investigation, the National Institutes of Health (NIH) launched the Mammalian Gene Collection (MGC) program (4) with the aim of providing freely accessible, high-quality sequences for validated, complete ORF cDNA clones. In this article, we describe our progress toward the goal of identifying and accurately sequencing at least one full ORF-containing cDNA clone for each human and mouse gene, as well as making these fully sequenced clones available without restriction. Materials and MethodscDNA Library Production. MGC cDNA libraries were prepared from a diverse set of tissues and cell lines, in several different vector systems, by using a variety of methods. Vector maps and details of library construction are available at http:͞͞mgc. nci.nih.gov͞Info͞VectorMaps. The complete sequences for each of the MGC vectors can be found at http:͞͞image.llnl.gov͞ image͞html͞vectors.shtml. The catalog of MGC cDNA libraries can be accessed at http:͞͞mgc.nci.nih.gov.

show abstract

The relationship of 5HTT (SLC6A4) methylation and genotype on mRNA expression and liability to major depression and alcohol dependence in subjects from the Iowa Adoption Studies

Philibert

Sandhu

Hollenbeck

et al. 2007

American J of Med Genetics Pt B

264

217

View full text Add to dashboard Cite

Serotonin Transporter (5HTTor SLC6A4) mRNA transcription is regulated by both genetic and epigenetic mechanisms. Unfortunately, despite intense scrutiny, the exact identity and contribution of each of these regulatory mechanisms, and their relationship to behavioral illness remain unknown. This lack of knowledge is critical because alterations in SLC6A4 function are posited to be central to a wide variety of CNS disorders. In order to address this shortcoming, we quantified 5HTTLPR genotype, SLC6A4 mRNA production and CpG methylation using biomaterial from 192 lymphoblast cell lines derived from subjects who participated in the latest wave of the Iowa Adoption Studies. We then analyzed the resulting data with respect to clinical characteristics. We confirmed prior findings that the short (s) 5HTTLPR allele is associated with lower amounts of mRNA transcription, but there was no significant effect of the “Long G” allele on mRNA transcription. We also found that CpG methylation was higher (P< 0.0008) and mRNA production (P< 0.0001) was lower in females as compared to males. Those subjects with a lifetime history of Alcohol Dependence had higher levels of SLC6A4 mRNA. There was a trend for an association of increased overall methylation with lifetime history of major depression. Finally, we confirm our prior findings that the exact levels of 5HTT mRNA expression are dependent on how it is measured. We conclude that both genetic variation and epigenetic modifications contribute to the regulation of SLC6A4 function and that more in-depth studies of the molecular mechanisms controlling gene activity and the relationship of these mechanisms to behavioral illness are indicated.

show abstract

Let-7 family of microRNA is required for maturation and adult-like metabolism in stem cell-derived cardiomyocytes

Kuppusamy

Jones²,

Sperber

et al. 2015

Proc. Natl. Acad. Sci. U.S.A.

226

217

View full text Add to dashboard Cite

In metazoans, transition from fetal to adult heart is accompanied by a switch in energy metabolism-glycolysis to fatty acid oxidation. The molecular factors regulating this metabolic switch remain largely unexplored. We first demonstrate that the molecular signatures in 1-year (y) matured human embryonic stem cell-derived cardiomyocytes (hESC-CMs) are similar to those seen in in vivo-derived mature cardiac tissues, thus making them an excellent model to study human cardiac maturation. We further show that let-7 is the most highly up-regulated microRNA (miRNA) family during in vitro human cardiac maturation. Gain-and loss-of-function analyses of let-7g in hESC-CMs demonstrate it is both required and sufficient for maturation, but not for early differentiation of CMs. Overexpression of let-7 family members in hESC-CMs enhances cell size, sarcomere length, force of contraction, and respiratory capacity. Interestingly, large-scale expression data, target analysis, and metabolic flux assays suggest this let-7-driven CM maturation could be a result of down-regulation of the phosphoinositide 3 kinase (PI3K)/AKT protein kinase/insulin pathway and an up-regulation of fatty acid metabolism. These results indicate let-7 is an important mediator in augmenting metabolic energetics in maturing CMs. Promoting maturation of hESC-CMs with let-7 overexpression will be highly significant for basic and applied research.everal coronary heart diseases (CHDs) are characterized by cardiac dysfunctions predominantly manifested during cardiac maturation (1, 2). Dramatic changes in energy metabolism occur during this postnatal cardiac maturation (3). At early embryonic development, glycolysis is a major source of energy for cardiomyocytes (CMs) (4, 5). However, as the cardiomyocytes mature, mitochondrial oxidative metabolism increases with fatty acid oxidation, providing 90% of the heart's energy demands (6-8). This switch in cardiac metabolism has been shown to have important implications during in vivo cardiac maturation (9). In contrast to the relatively advanced knowledge of the genetic network that contributes to heart development during embryogenesis (10, 11), molecular factors that regulate peri-and postnatal cardiac maturation, particularly in relation to the metabolic switch, remain largely unclear. So far, studies to understand the transition of the glycolysisdependent fetal heart to oxidative metabolism in the adult heart have been mostly related to the peroxisome proliferatoractivated receptor (PPAR)/estrogen-related receptor/PPARγ coactivator-1α circuit (7,8,12). However, it is currently unknown what other factors act upstream or in synergy with this pathway in controlling cardiac energetics.miRNAs have emerged as key factors in controlling the complex regulatory network in a developing heart (13). Genetic studies that enrich or deplete miRNAs in specific cardiac tissue types and large-scale gene expression studies have demonstrated that they achieve such complex control at the level of cardiac gene expression (14-16). We sou...

show abstract

Complete sequence and gene map of a human major histocompatibility complex

Beck

Geraghty

Inoko

et al. 1999

Nature

939

207

View full text Add to dashboard Cite

Here we report the first complete sequence and gene map of a human major histocompatibility complex (MHC), a region on chromosome 6 which is essential to the immune system. When it was discovered over 50 years ago the region was thought to specify histocompatibility genes, but their nature has been resolved only in the last two decades. Although many of the 224 identified gene loci (128 predicted to be expressed) are still of unknown function, we estimate that about 40% of the expressed genes have immune system function. Over 50% of the MHC has been sequenced twice, in different haplotypes, giving insight into the extraordinary polymorphism and evolution of this region. Several genes, particularly of the MHC class II and III regions, can be traced by sequence similarity and synteny to over 700 million years ago, clearly predating the emergence of the adaptive immune system some 400 million years ago. The sequence is expected to be invaluable for the identification of many common disease loci. In the past, the search for these loci has been hampered by the complexity of high gene density and linkage disequilibrium.

show abstract

A multigene mutation classification of 468 colorectal cancers reveals a prognostic role for APC

Yang²,

et al. 2016

View full text Add to dashboard Cite

Colorectal cancer (CRC) is a highly heterogeneous disease, for which prognosis has been relegated to clinicopathologic staging for decades. There is a need to stratify subpopulations of CRC on a molecular basis to better predict outcome and assign therapies. Here we report targeted exome-sequencing of 1,321 cancer-related genes on 468 tumour specimens, which identified a subset of 17 genes that best classify CRC, with APC playing a central role in predicting overall survival. APC may assume 0, 1 or 2 truncating mutations, each with a striking differential impact on survival. Tumours lacking any APC mutation carry a worse prognosis than single APC mutation tumours; however, two APC mutation tumours with mutant KRAS and TP53 confer the poorest survival among all the subgroups examined. Our study demonstrates a prognostic role for APC and suggests that sequencing of APC may have clinical utility in the routine staging and potential therapeutic assignment for CRC.

show abstract

The complete genome sequence of the avian pathogen Mycoplasma gallisepticum strain Rlow

et al. 2003

View full text Add to dashboard Cite

The complete genome of Mycoplasma gallisepticum strain R low has been sequenced. The genome is composed of 996 422 bp with an overall G+C content of 31 mol%. It contains 742 putative coding DNA sequences (CDSs), representing a 91 % coding density. Function has been assigned to 469 of the CDSs, while 150 encode conserved hypothetical proteins and 123 remain as unique hypothetical proteins. The genome contains two copies of the rRNA genes and 33 tRNA genes. The origin of replication has been localized based on sequence analysis in the region of the dnaA gene. The vlhA family (previously termed pMGA) contains 43 genes distributed among five loci containing 8, 2, 9, 12 and 12 genes. This family of genes constitutes 10?4 % (103 kb) of the total genome. Two CDSs were identified immediately downstream of gapA and crmA encoding proteins that share homology to cytadhesins GapA and CrmA. Based on motif analysis it is predicted that 80 genes encode lipoproteins and 149 proteins contain multiple transmembrane domains. The authors have identified 75 proteins putatively involved in transport of biomolecules, 12 transposases, and a number of potential virulence factors. The completion of this sequence has spawned multiple projects directed at defining the biological basis of M. gallisepticum. INTRODUCTIONPhylogenetic analyses indicate that mycoplasmas (class Mollicutes) have undergone a degenerative evolution from related, low G+C content, Gram-positive eubacteria (Rogers et al., 1985;Woese et al., 1980). The reduction of the mycoplasma genome has resulted in the loss of the cell wall and has limited the biosynthetic capabilities of these organisms. As a consequence of this loss of biosynthetic machinery, mycoplasmas are obligate parasites and rely on the uptake of many essential molecules from their environment.Mycoplasmas have long been considered model systems for defining the minimal set of genes required for a living cell (Morowitz, 1984). For this reason, it was not surprising when Mycoplasma genitalium (580 kb) was selected as one of the first targets for complete genome sequencing (Fraser et al., 1995). Since this initial report, the genomes of four additional mycoplasmas have been sequenced, Mycoplasma pneumoniae (816 kb; Dandekar et al., 2000;Himmelreich et al., 1996), Ureaplasma urealyticum (752 kb; Glass et al., 2000), Mycoplasma pulmonis (964 kb; Chambaud et al., 2001) and Mycoplasma penetrans (1358 kb; Sasaki et al., 2000). Theoretical and experimental approaches have estimated the minimum number of essential mycoplasma genes to be between 265 and 350 (Hutchison et al., 1999;Mushegian & Koonin, 1996).Abbreviations: CDS, coding DNA sequence; COGs, conserved orthologous groups.The GenBank accession number for the sequence reported in this paper is AE015450. Mycoplasma gallisepticum is an avian pathogen involved in chronic respiratory disease in chickens resulting in considerable economic losses in poultry production. Infection with this bacterium is spread by aerosol exposure and egg transmission. Outbreaks spread...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Anup Madan

Initial sequencing and analysis of the human genome

CAP3: A DNA Sequence Assembly Program

Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences

The relationship of 5HTT (SLC6A4) methylation and genotype on mRNA expression and liability to major depression and alcohol dependence in subjects from the Iowa Adoption Studies

Let-7 family of microRNA is required for maturation and adult-like metabolism in stem cell-derived cardiomyocytes

Complete sequence and gene map of a human major histocompatibility complex

A multigene mutation classification of 468 colorectal cancers reveals a prognostic role for APC

The complete genome sequence of the avian pathogen Mycoplasma gallisepticum strain Rlow

Contact Info

Product

Resources

About