2020
DOI: 10.1093/database/baaa062
|View full text |Cite
|
Sign up to set email alerts
|

NCBI Taxonomy: a comprehensive update on curation, resources and tools

Abstract: The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration. Since the last review of this resource in 2012, it has undergone several improvements. Most notable is the shift from a single SQL database to a series of linked databases tied to a framework of data called NameBank. This means that relations among data elements can be ad… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
884
0
11

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 1,239 publications
(1,050 citation statements)
references
References 66 publications
3
884
0
11
Order By: Relevance
“…FamDB les contain family consensi/HMMs and the NCBI Taxonomy data related to these families in a format that allows for fast o ine access from the command line. The current release of FamDB includes all Dfam consensus sequences, HMMs, metadata, and 61,003 taxa from NCBI's taxonomy database [44] related to these families. Lookups for information on a single taxon or family complete in about a second; extraction of consensus sequences (FASTA, EMBL) or HMMs for all TE families found in Human (including ancestral repeats) complete in about 3 to 4 seconds.…”
Section: Software/tool Distribution Improvementsmentioning
confidence: 99%
“…FamDB les contain family consensi/HMMs and the NCBI Taxonomy data related to these families in a format that allows for fast o ine access from the command line. The current release of FamDB includes all Dfam consensus sequences, HMMs, metadata, and 61,003 taxa from NCBI's taxonomy database [44] related to these families. Lookups for information on a single taxon or family complete in about a second; extraction of consensus sequences (FASTA, EMBL) or HMMs for all TE families found in Human (including ancestral repeats) complete in about 3 to 4 seconds.…”
Section: Software/tool Distribution Improvementsmentioning
confidence: 99%
“…We have already implemented functions in RESCRIPt to format the popular SILVA rRNA gene and NCBI GenBank databases, and are planning future support for parsing and editing other taxonomy formats, as well as mapping between these formats [71]. There are 4 codes of nomenclature as reviewed in [88] In recent years, the explosion of high throughput sequencing technologies has allowed researchers to generate genomic data on many as yet uncultured microbial taxa. In fact, the rate at which novel genomic data can be acquired [94], and rapidly placed within a phylogenetic context [23], has surpassed our ability to appropriately resolve any conflicts with traditional Linnaean taxonomy.…”
Section: The Curation Problemmentioning
confidence: 99%
“…We conclude that the size and taxonomic comprehensiveness of SILVA are major assets, though GTDB and NCBI-RefSeq may be more suitable for various applications that respectively require greater taxonomic and phylogenetic rigor. The use of genomes sequenced from type material provides these two databases with a robust taxonomic and phylogenetic backbone that enables users to link natural history and experimental science [88,99]. NCBI-RefSeq's species records are extracted from data submissions to the International Nucleotide Sequence Database Collaboration (INSDC), i.e., NCBI-GenBank, the European Nucleotide Archive (ENA), and the DNA Data Bank of Japan (DDBJ).…”
Section: The Evaluation Problemmentioning
confidence: 99%
See 1 more Smart Citation
“…It is, therefore, not surprising that no unified, joint classification underpins the many online resources that house and curate mycological data. Indeed, a user is likely to find many differences when comparing the classifications used in, e.g., GenBank [6], MycoBank [7], UNITE [8], CoL/GBIF [9], and BOLD [10]. The classifications of each of these many resources evolve more or less independently over time; some resources seek to offer the latest developments and thus incorporate the results of all recent studies in systematics, whereas others prefer to adopt only the most well-vetted aspects of the new classifications.…”
Section: Introductionmentioning
confidence: 99%