Abstract:Interpreting the functional content of a given genomic sequence is one of the central challenges of biology today. Perhaps the most promising approach to this problem is based on the comparative method of classic biology in the modern guise of sequence comparison. For instance, protein-coding regions tend to be conserved between species. Hence, a simple method for distinguishing a functional exon from the chance absence of stop codons is to investigate its homologue from closely related species. Predicting reg… Show more
“…There is currently a lot of interest in comparative genomics [8]. In many of these projects detection of regions unique to a genome is one of the first steps towards functional annotation (e. g. [7]).…”
Background: Sequence comparison by alignment is a fundamental tool of molecular biology. In this paper we show how a number of sequence comparison tasks, including the detection of unique genomic regions, can be accomplished efficiently without an alignment step. Our procedure for nucleotide sequence comparison is based on shortest unique substrings. These are substrings which occur only once within the sequence or set of sequences analysed and which cannot be further reduced in length without losing the property of uniqueness. Such substrings can be detected using generalized suffix trees.
“…There is currently a lot of interest in comparative genomics [8]. In many of these projects detection of regions unique to a genome is one of the first steps towards functional annotation (e. g. [7]).…”
Background: Sequence comparison by alignment is a fundamental tool of molecular biology. In this paper we show how a number of sequence comparison tasks, including the detection of unique genomic regions, can be accomplished efficiently without an alignment step. Our procedure for nucleotide sequence comparison is based on shortest unique substrings. These are substrings which occur only once within the sequence or set of sequences analysed and which cannot be further reduced in length without losing the property of uniqueness. Such substrings can be detected using generalized suffix trees.
“…However, for most evolutionary biologists, the sizes of mammalian genomes, including those of human and mouse, are too large to analyze easily. Furthermore, dynamic and complicated rearrangements of the genome sequences have taken place during the evolutionary process from common ancestral organisms (Haubold and Wiehe, 2004). In order to address the issues inherent in the evolution of mammalian genomes, we have to overcome these issues pertaining to difficulties concerning genome size and complexity.…”
“…Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis [1]. The most accurate multiple alignment tool arguably remains the human eye.…”
Background: Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantageous in terms of both speed and accuracy to align the amino-acid sequences specified by the DNA sequences rather than the DNA sequences themselves. Many implementations making use of this concept of "translated alignments" are incomplete in the sense that they require the user to manually translate the DNA sequences and to perform the amino-acid alignment. As such, they are not well suited to large-scale automated alignments of large and/or numerous DNA data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.