“…Traditionally in probability theory, when we only have partial knowledge about the probability distribution, we select a distribution for which the entropy − ∫ ρ(x) · ln(ρ(x)) dx attains the largest possible value (see, e.g., [3]), or, equivalently, for which the integral ∫ ρ(x) · ln(ρ(x)) dx attains the smallest possible value. It is worth mentioning that, in general, if we assume that the criterion for selecting a probability distribution is scale-invariant (in some reasonable sense), then this criterion is equivalent to optimizing either entropy, or generalized entropy ∫ ln(ρ(x)) dx or ∫ ρ α (x) dx, for some α > 0; see, e.g., [5]. Our analysis shows that the generalized entropy corresponding to α = 2 and α = 3 describes mean-squared robustness.…”