We propose Locally Geometric Crossover (LGX) for genetic programming. For a pair of homologous loci in the parent solutions, LGX finds a semantically intermediate procedure from a previously prepared library, and uses it as replacement code. The experiments involving six symbolic regression problems show significant increase in search performance when compared to standard subtree-swapping crossover and other control methods. This suggests that semantically geometric manipulations on subprograms propagate to entire programs and improve their fitness.
Keywordsgenetic programming, program semantics, semantic crossover
MOTIVATION AND THE METHODStandard search operators used in genetic programming (GP) are blind to program semantic. For them, programs are purely symbolic structures, where opcodes associated with particular instructions have no particular meaning. Given that knowledge of semantics is one of factors that makes human programming so effective, equipping GP algorithms in some semantic-aware extensions could have positive impact on their performance.We devise a method that makes GP alert to certain semantic aspects of programs. For this, we assume that fitness function is a metric || || that captures the divergence (error) between program output and some known desired output, which is typically the case in GP. Consequently, the fitness landscape is a convex surface spanned over the space of vectors that hold program outputs.Convexity allows designing recombination operators that are likely to yield offspring of good quality [6]. We exploit this property locally, on the level of program parts, rather than entire programs. The method operates in two phases. First, a library L of short procedures of height at most h is generated. Next, we calculate the semantics s(p) of every procedure p ∈ L, meant here as the vector of outcomes Copyright is held by the author/owner(s). GECCO'12 Companion, July 7-11, 2012, Philadelphia, PA, USA. ACM 978-1-4503-1178-6/12/07. it produces for the fitness cases. For any two procedures p 1, p2 that have the same semantics (||s(p1) − s(p2)|| = 0) we discard the longer one to avoid redundancy.In the second phase, the library is used by the proposed locally geometric crossover (LGX) during a GP run. Given two parent programs p 1 and p2, LGX first identifies the structurally common region [7], i.e., the set of tree node locations that occur in both parents. Then LGX appoints a single random locus in the common region as a crossover point. To reduce bloat, this follows the same rules as in standard GP [2]: with probability 0.1, the locus is a leaf in the common region, otherwise an internal node.Let p 1 and p 2 denote the subtrees rooted in the drawn locus in p 1 and p2, respectively. We calculate their semantics, s(p 1 ) and s(p 2 ), and determine the midpoint between them in the semantic space: s m = (s(p 1 ) + s(p 2 ))/2. This point represents the semantics of a hypothetical procedure p : s m = s(p) which, if inserted into parents at the appointed locus, would make the resulting offs...