Effective representation of DNA sequences is one of the important tasks in the study of genome sequences. In this paper, we propose a graphical representation of DNA sequences based on nucleotide ring structure. In the proposed representation, we convert DNA sequences into 16 dinucleotides on the surface of the hexagon so that it can preserve nucleotide’s chemical property and positional information. Our approach can provide capability of efficient similarity comparison between DNA sequences and also high comparison accuracy. Furthermore, our approach satisfies uniqueness and no degeneracy of DNA sequences. In the experimental study, we use phylogeny analysis for evolutionary relationship among different species. Extensive performance study shows that the proposed method can give better performance than existing methods in comparison with the degree of similarity.
Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of molecular biology. The main task of splice site prediction is to find out the exact GT and AG ended sequences. Then it identifies the true and false GT and AG ended sequences among those candidate sequences. In this paper, we survey research works on splice site prediction based on support vector machine (SVM). The basic difference between these research works is nucleotide encoding technique and SVM kernel selection. Some methods encode the DNA sequence in a sparse way whereas others encode in a probabilistic manner. The encoded sequences serve as input of SVM. The task of SVM is to classify them using its learning model. The accuracy of classification largely depends on the proper kernel selection for sequence data as well as a selection of kernel parameter. We observe each encoding technique and classify them according to their similarity. Then we discuss about kernel and their parameter selection. Our survey paper provides a basic understanding of encoding approaches and proper kernel selection of SVM for splice site prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.