Combinatorial pathway
optimization is an important tool
in metabolic
flux optimization. Simultaneous optimization of a large number of
pathway genes often leads to combinatorial explosions. Strain optimization
is therefore often performed using iterative design–build–test–learn
(DBTL) cycles. The aim of these cycles is to develop a product strain
iteratively, every time incorporating learning from the previous cycle.
Machine learning methods provide a potentially powerful tool to learn
from data and propose new designs for the next DBTL cycle. However,
due to the lack of a framework for consistently testing the performance
of machine learning methods over multiple DBTL cycles, evaluating
the effectiveness of these methods remains a challenge. In this work,
we propose a mechanistic kinetic model-based framework to test and
optimize machine learning for iterative combinatorial pathway optimization.
Using this framework, we show that gradient boosting and random forest
models outperform the other tested methods in the low-data regime.
We demonstrate that these methods are robust for training set biases
and experimental noise. Finally, we introduce an algorithm for recommending
new designs using machine learning model predictions. We show that
when the number of strains to be built is limited, starting with a
large initial DBTL cycle is favorable over building the same number
of strains for every cycle.