2006
DOI: 10.1002/0471250953.bia01bs16
|View full text |Cite
|
Sign up to set email alerts
|

Common File Formats

Abstract: This appendix discusses a few of the file formats frequently encountered in bioinformatics. Specifically, it reviews the rules for generating FASTA files and provides guidance for interpreting NCBI descriptor lines, commonly found in FASTA files. In addition, it reviews the construction of GenBank, Phylip, MSF and Nexus files.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2009
2009
2022
2022

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 0 publications
0
9
0
Order By: Relevance
“…The 16S-rRNA gene sequence fasta files (the standard text based format for representing nucleotide sequence [ 37 , 38 ]) and quality data were extracted from the SFF files generated by the 454 Titanium sequencer (MR DNA, Shallowater, TX USA). Average read length before trimming and quality control was 405 bp.…”
Section: S-rdna Bacterial Community Analysismentioning
confidence: 99%
“…The 16S-rRNA gene sequence fasta files (the standard text based format for representing nucleotide sequence [ 37 , 38 ]) and quality data were extracted from the SFF files generated by the 454 Titanium sequencer (MR DNA, Shallowater, TX USA). Average read length before trimming and quality control was 405 bp.…”
Section: S-rdna Bacterial Community Analysismentioning
confidence: 99%
“…For example, in meta-analysis of sequences an established and unified file standard is crucial ( Ten Hoopen et al, 2017 ). FASTA ( Pearson & Lipman, 1988 ), FASTQ ( Cock et al, 2010 ), and SAM/BAM ( Li et al, 2009 ) are famous examples of file standards that have allowed the effective exchange of information between numerous groups involved in the earliest sequencing projects ( Leonard & Littlejohn, 2004 ; Ondřej & Dvořák, 2012 ; Zhang, 2016 ). Any disparities in the sampling method also have to be taken into account when biological material is concerned, so it is essential they are recorded appropriately ( Ten Hoopen et al, 2017 ).…”
Section: Challenges Limitations and Risks Of Data Reuse And Possiblementioning
confidence: 99%
“…For any valid comparison between datasets from different databases or for integration of databases themselves, an established and unified file standard is crucial. FASTA (36), FASTQ (37), and SAM/BAM (38) are famous examples of file standards that allowed effective exchange of information between numerous groups involved in the earliest sequencing projects (39)(40)(41).…”
Section: Comparison and Integration Of Different Databasesmentioning
confidence: 99%