2017
DOI: 10.1093/nar/gkx1031
|View full text |Cite
|
Sign up to set email alerts
|

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation

Abstract: The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
84
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 107 publications
(88 citation statements)
references
References 27 publications
0
84
0
Order By: Relevance
“…principal), deleteriousness of annotation (i.e. prefer transcripts with higher impact annotations), CCDS 95 status of transcript (i.e. a high-quality transcript set), canonical status of transcript, and transcript length (i.e.…”
Section: Methodsmentioning
confidence: 99%
“…principal), deleteriousness of annotation (i.e. prefer transcripts with higher impact annotations), CCDS 95 status of transcript (i.e. a high-quality transcript set), canonical status of transcript, and transcript length (i.e.…”
Section: Methodsmentioning
confidence: 99%
“…All transcripts with the 3′ most stop codon were initially selected, ensuring that all 3′UTRs did not overlap with the CDS of alternatively spliced transcripts when quantifying SCR. Genes with multiple transcripts sharing this stop codon were then filtered, first selecting genes with primary APPRIS transcripts (Rodriguez et al, 2013), and then alternative APPRIS transcripts, followed by inclusion in the consensus CDS gene set (CCDS) (Pujar et al, 2018), and lastly selecting transcripts with the longest coding sequence for genes without APPRIS or CCDS transcripts for this 3′ most stop codon. For genes still containing multiple isoforms, transcripts were finally filtered by choosing transcripts with the shortest 3′UTRs, and lastly selecting those with the shortest 5′UTRs.…”
Section: Selection Of Single Transcripts For Each Protein Coding Genementioning
confidence: 99%
“…Counts-per-million (CPM) values were calculated for all genes to normalize read counts resulting from per-replicate differences of sequencing depths. We focused on transcript abundance values of consensus coding sequence genes (CCDS) database V22 [72] and filtered other classes of genes from our dataset. We also filtered lowly expressed genes and only included genes with above-baseline transcript abundance (CPM > 0.5) in at least two replicates.…”
Section: Methodsmentioning
confidence: 99%