“…The package then creates and trains neural networks with hyperparameters drawn from these choices, and returns the best performing hyperparameters based on a metric, which we specify to be the validation loss upon completion of training. The choices for optimizable hyperparameters are as follows: activation function (Relu [63,64], Elu [65], Tanh, SOFTMAX, SOFTPLUS, SOFTSIGN [66]), optimizer (ADAM [67], NADAM [68], ADAMAX [67], ADADELTA [69]), number of training epochs (1000, 2000, 10,000), and minibatch size (32,64,128). We refer the reader to [70] for a systematic overview of activation functions and [71] for an overview of gradient descent optimization algorithms.…”