This paper builds on previous research from an EA used to predict secondary structure of RNA molecules. The IEA predicts which specific canonical base pairs will form hydrogen bonds and helices. Three new thermodynamic models were integrated into our EA. The first is based on a modification to our original base pair model. The last two, INN and INN-HB, add stacking-energies using base pair adjacencies. We have tested RNA sequences of lengths 122, 543, and 1494 nucleotides on a wide variety of operators and parameters settings. The accuracy of the predicted structures are compared to the known structures thus demonstrating the benefits of using stacking-energies in structure prediction. Some other improvements to our EA are also discussed.
I. INTRODUCTIONRNA is an important biological molecule. It plays key roles in the synthesis of protein from DNA. It is also known for its structural and catalytic roles in the cell [I]. For the purpose of structure prediction, it can be simply described as a flexible single-stranded biopolymer. The biopolymer is made from a sequence of 4 different nucleotides, adenine (A), cytosine (C), guanine (G), and uracil (U). Intra-molecular base pairs can form between different nucleotides, folding the sequence onto itself. The most stable and commvn of these base pairs are GC, AU, and GU, and their mirrors, CG, UA, and UG. These pairs are called canonical base pairs. A more elaborate description of RNA can be found in our previous research (121, [3], 141).A base pair does not form in isolation in our model. We consider stacked pairs, also called a helix, only when three or more adjacent pairs form. Also, the loop connecting a set of stacked pairs must be no shorter than 3 nucleotides in length. By using these simple rules, it is possible to enumerate all the possible helices that can form in a structure. The challenge is in predicting which ones will actually form in nature. The listing of these base pairs in a single structure is what is called the secondary structure of that RNA molecule. Determining the secondary structure of RNA through laboratory methods, such as NMR [SI and XiRay crystallography [6], is challenging and expensive. Hence, computation prediction methods have been proposed as an alternate approach to this problem.Genetic algorithms (GAS) [7] have been applied previously to this problem domain. A serial GA [8] was used to find optimal and sub-optimal structures. A massively parallel GA [9] was introduced in this domain followed by a new mutation operator [IO]. A binary coded GA was also used to show it could approximate the folding pathway of an RNA molecule by adding and deleting helices ([Ill, [12]).These improvements increased the number of true-positive stem loops and base pairs correctly predicted making the GA's prediction even more accurate than that of a dynamic programming algorithm (DPA). DPAs are known for finding th,e single optimal solution within a given thermodynamic model. However, in RNA secondary structure prediction, the optima.1 fold is often a the...