“…DNA sequence is a string containing characters A, C, G and T, which means that Σ = {A, T, C, G} for DNA; Σ = {A,C,G,U} for RNA, and for the protein sequence, Σ = { A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y}. As stated in [19]: The development of fast methods for sequencing genes and proteins is one of the most significant technological achievements of recent times. This has enabled the creation of large databases which can be processed by abstracting sequences of nucleic acids (DNA, RNA) and amino acids (proteins) into strings of characters.…”