1995
DOI: 10.1093/bioinformatics/11.2.147
|View full text |Cite
|
Sign up to set email alerts
|

A local alignment tool for very long DNA sequences

Abstract: Abstract. This paper presents a practical program, called sim2, for building local alignments of two sequences, each of which may be hundreds of kilobases long. Sim2 first constructs n best non-intersecting chains of ''fragments,'' such as all occurrences of identical 5-tuples in each of two DNA sequences, for any specified n ≥ 1. Each chain is then refined by delivering an optimal alignment in a region delimited by the chain. Sim2 requires only space proportional to the size of the input sequences and the out… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

1996
1996
2015
2015

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 35 publications
(30 citation statements)
references
References 19 publications
(19 reference statements)
0
30
0
Order By: Relevance
“…Human and mouse mRNA and protein sequences were aligned using the sire2 program (Chao et al 1995) that, by sequence accession number, extracts sequence data directly from GenBank (Benson et al 1996) using the Entrez application programming interface (ftp:// ncbi.nlm.nih.gov/toolbox/ncbi_tools). This assures that the most recent data always was used.…”
Section: Methodsmentioning
confidence: 99%
“…Human and mouse mRNA and protein sequences were aligned using the sire2 program (Chao et al 1995) that, by sequence accession number, extracts sequence data directly from GenBank (Benson et al 1996) using the Entrez application programming interface (ftp:// ncbi.nlm.nih.gov/toolbox/ncbi_tools). This assures that the most recent data always was used.…”
Section: Methodsmentioning
confidence: 99%
“…Both the BLAST search and Entrez access require connections to the servers at NCBI; all of the other processes are computed on the client machine. Prior to the BLAST search, SIM2 (Chao et al 1994) computes repeat regions in the query sequence, and the results are automatically annotated as repeat features in the query sequence. For a DNA query sequence, the low complexity regions are identified by the ''dust'' program (J. Kuzio, R. Tatusov, and D.J.…”
Section: Figurementioning
confidence: 99%
“…Organismspecific results can be obtained at any level of the NCBI taxonomy by filtering the HSP alignments inclusively or exclusively with Etrez Taxonomy Server. A suite of SIM algorithms, which include SIN (Huang et al 1990), SIM2 (Chao et al 1994), and SIM3 (Chao et al 1997) may be selected to compute more refined gapped alignments. The details of repeat filtering, processing of large sequences, restricting the search by organism, and gapped alignments are described below.…”
Section: Figurementioning
confidence: 99%
See 1 more Smart Citation
“…However, since their complexities are quadratic with respect to the length of the two sequences this approach leads to a high computing time. One frequently used approach to speed up this time consuming operation is to introduce heuristics to the alignment algorithm [14,15]. The main drawback of this solution is that the more time efficient the heuristics, the worse is the quality of the result [16].…”
Section: Introductionmentioning
confidence: 99%