Most evolutionary optimization algorithms assume that the evaluation of the objective and constraint functions is straightforward. In solving many real-world optimization problems, however, such objective functions may not exist. Instead, computationally expensive numerical simulations or costly physical experiments must be performed for fitness evaluations. In more extreme cases, only historical data are available for performing optimization and no new data can be generated during optimization. Solving evolutionary optimization problems driven by data collected in simulations, physical experiments, production processes, or daily life are termed data-driven evolutionary optimization. In this paper, we provide a taxonomy of different data driven evolutionary optimization problems, discuss main challenges in data-driven evolutionary optimization with respect to the nature and amount of data, and the availability of new data during optimization. Real-world application examples are given to illustrate different model management strategies for different categories of data-driven optimization problems.
Although over 50 complete Escherichia coli/Shigella genome sequences are available, it is only for closely related strains, for example the O55:H7 and O157:H7 clones of E. coli, that we can assign differences to individual evolutionary events along specific lineages. Here we sequence the genomes of 14 isolates of a uropathogenic E. coli clone that persisted for 3 years within a household, including a dog, causing a urinary tract infection (UTI) in the dog after 2 years. The 20 mutations observed fit a single tree that allows us to estimate the mutation rate to be about 1.1 per genome per year, with minimal evidence for adaptive change, including in relation to the UTI episode. The host data also imply at least 6 host transfer events over the 3 years, with 2 lineages present over much of that period. To our knowledge, these are the first direct measurements for a clone in a well-defined host community that includes rates of mutation and host transmission. There is a concentration of non-synonymous mutations associated with 2 transfers to the dog, suggesting some selection pressure from the change of host. However, there are no changes to which we can attribute the UTI event in the dog, which suggests that this occurrence after 2 years of the clone being in the household may have been due to chance, or some unknown change in the host or environment. The ability of a UTI strain to persist for 2 years and also to transfer readily within a household has implications for epidemiology, diagnosis, and clinical intervention.
Scheme 1. Schematic illustration showing the synthetic procedure of A@UiO-66-H-P NPs and the mechanism of photodynamic therapy and hypoxia-activated cascade chemotherapy.
Gaussian processes (GPs) are the most popular model used in surrogate-assisted evolutionary optimization of computationally expensive problems, mainly because GPs are able to measure the uncertainty of the estimated fitness values, based on which certain infill sampling criteria can be used to guide the search and update the surrogate model. However, the computation time for constructing GPs may become excessively long when the number of training samples increases, which makes it inappropriate to use them as surrogates in evolutionary optimization. To address this issue, this paper proposes to use ensembles as surrogates and infill criteria for model management in evolutionary optimization. A heterogeneous ensemble consisting of a least square support vector machine and two radial basis function networks is constructed to enhance the reliability of ensembles for uncertainty estimation. In addition to the original decision variables, a selected subset of the decision variables and a set of transformed variables are used as inputs of the heterogeneous ensemble to further promote the diversity of the ensemble. The proposed heterogeneous ensemble is compared with a GP and a homogeneous ensemble for infill sampling criteria in evolutionary multiobjective optimization. Experimental results demonstrate that the heterogeneous ensemble is competitive in performance compared with GPs and much more scalable in computational complexity to the increase in search dimension.
Gaussian processes are widely used in surrogateassisted evolutionary optimization of expensive problems mainly due to the ability to provide a confidence level of their outputs, making it possible to adopt principled surrogate management methods such as the acquisition function used in Bayesian optimization. Unfortunately, Gaussian processes become less practical for high-dimensional multi-and many-objective optimization as their computational complexity is cubic in the number of training samples. In this paper, we propose a computationally efficient dropout neural network (EDN) to replace the Gaussian process and a new model management strategy to achieve a good balance between convergence and diversity for assisting evolutionary algorithms to solve high-dimensional multi-and many-objective expensive optimization problems. While the conventional dropout neural network needs to save a large number of network models during the training for calculating the confidence level, only one single network model is needed in the EDN to estimate the fitness and its confidence level by randomly ignoring neurons in both training and testing the neural network. Extensive experimental studies on benchmark problems with up to 100 decision variables and 20 objectives demonstrate that, compared to state-of-the-art, the proposed algorithm is not only highly competitive in performance but also computationally more scalable to high-dimensional many-objective optimization problems. Finally, the proposed algorithm is validated on an operational optimization problem of crude oil distillation units, further confirming its capability of handling expensive problems given a limited computational budget.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.