“…Although the ReLU function is one of the most commonly used activation functions in deep learning models, various other activation functions have been considered in the literature for constructing neural networks that approximate functions of given smoothness. In particular, networks with piece-wise linear, RePU and hyperbolic tangent activation functions as well as networks with activations belonging to the families {sin, arcsin} and {⌊•⌋, 2 x , 1 x≥0 } have been studied in the works [4], [6], [7], [10] and [16]. Particular choice of the (family of) activation function(s) may be caused, for example, by its computational simplicity, representational sparsity, smoothness, (super)expressiveness, etc.…”