Microbes are being
engineered for an increasingly large and diverse
set of applications. However, the designing of microbial genomes remains
challenging due to the general complexity of biological systems. Adaptive
Laboratory Evolution (ALE) leverages nature’s problem-solving
processes to generate optimized genotypes currently inaccessible to
rational methods. The large amount of public ALE data now represents
a new opportunity for data-driven strain design. This study describes
how novel strain designs, or genome sequences not yet observed in
ALE experiments or published designs, can be extracted from aggregated
ALE data and demonstrates this by designing, building, and testing
three novel
Escherichia coli
strains with fitnesses
comparable to ALE mutants. These designs were achieved through a meta-analysis
of aggregated ALE mutations data (63
Escherichia coli
K-12 MG1655 based ALE experiments, described by 93 unique environmental
conditions, 357 independent evolutions, and 13 957 observed
mutations), which additionally revealed global ALE mutation trends
that inform on ALE-derived strain design principles. Such informative
trends anticipate ALE-derived strain designs as largely gene-centric,
as opposed to noncoding, and composed of a relatively small number
of beneficial variants (approximately 6). These results demonstrate
how strain design efforts can be enhanced by the meta-analysis of
aggregated ALE data.