Julio Fernandez-Banet scite author profile

The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail.Database URL: http://www.ensembl.org/index.html

show abstract

The BioMart community portal: an innovative alternative to large, centralized data repositories

Smedley

Haider

Durinck³

et al. 2015

Nucleic Acids Res

684

550

View full text Add to dashboard Cite

The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations.

show abstract

The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes

Pruitt¹,

Harrow²,

Harte³

et al. 2009

Genome Res.

501

464

View full text Add to dashboard Cite

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.[Supplemental material is available online at www.genome.org. Data sets and documentation are available in the CCDS database at http://www.ncbi.nlm.nih.gov/CCDS.]One key goal of genome projects is to identify and accurately annotate all protein-coding genes. The resulting annotations add functional context to the sequence data and make it easier to traverse to other rich sources of gene and protein information. Accurately annotating known genes, identifying novel genes, and tracking annotations over time are complex processes that are best achieved through a combination of large-scale computational analyses and expert curation. These methods must (1) process repetitive sequences in multiple categories including retrotransposons, segmental duplications, and paralogs; (2) process variation including copy number variation (CNV) (Feuk et al. 2006) and microsatellites; (3) distinguish functional genes and alleles from pseudogenes; (4) define alternate splice products; and (5) avoid erroneous interpretation based on experimental error.

show abstract

Common variants at 12q15 and 12q24 are associated with infant head circumference

Taal¹,

Pourcain²,

Thiering³

et al. 2012

Nat Genet

130

117

View full text Add to dashboard Cite

To identify genetic variants associated with head circumference in infancy, we performed a meta-analysis of seven genome-wide association (GWA) studies (N=10,768 from European ancestry enrolled in pregnancy/birth cohorts) and followed up three lead signals in six replication studies (combined N=19,089). Rs7980687 on chromosome 12q24 (P=8.1×10−9), and rs1042725 on chromosome 12q15 (P=2.8×10−10) were robustly associated with head circumference in infancy. Although these loci have previously been associated with adult height1, their effects on infant head circumference were largely independent of height (P=3.8×10−7 for rs7980687, P=1.3×10−7 for rs1042725 after adjustment for infant height). A third signal, rs11655470 on chromosome 17q21, showed suggestive evidence of association with head circumference (P=3.9×10−6). SNPs correlated to the 17q21 signal show genome-wide association with adult intra cranial volume2, Parkinson’s disease and other neurodegenerative diseases3-5, indicating that a common genetic variant in this region might link early brain growth with neurological disease in later life.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.