Rate distortion theory is concerned with optimally encoding signals from a given signal class $$\mathcal {S}$$
S
using a budget of R bits, as $$R \rightarrow \infty $$
R
→
∞
. We say that $$\mathcal {S}$$
S
can be compressed at rates if we can achieve an error of at most $$\mathcal {O}(R^{-s})$$
O
(
R
-
s
)
for encoding the given signal class; the supremal compression rate is denoted by $$s^*(\mathcal {S})$$
s
∗
(
S
)
. Given a fixed coding scheme, there usually are some elements of $$\mathcal {S}$$
S
that are compressed at a higher rate than $$s^*(\mathcal {S})$$
s
∗
(
S
)
by the given coding scheme; in this paper, we study the size of this set of signals. We show that for certain “nice” signal classes $$\mathcal {S}$$
S
, a phase transition occurs: We construct a probability measure $$\mathbb {P}$$
P
on $$\mathcal {S}$$
S
such that for every coding scheme $$\mathcal {C}$$
C
and any $$s > s^*(\mathcal {S})$$
s
>
s
∗
(
S
)
, the set of signals encoded with error $$\mathcal {O}(R^{-s})$$
O
(
R
-
s
)
by $$\mathcal {C}$$
C
forms a $$\mathbb {P}$$
P
-null-set. In particular, our results apply to all unit balls in Besov and Sobolev spaces that embed compactly into $$L^2 (\varOmega )$$
L
2
(
Ω
)
for a bounded Lipschitz domain $$\varOmega $$
Ω
. As an application, we show that several existing sharpness results concerning function approximation using deep neural networks are in fact generically sharp. In addition, we provide quantitative and non-asymptotic bounds on the probability that a random $$f\in \mathcal {S}$$
f
∈
S
can be encoded to within accuracy $$\varepsilon $$
ε
using R bits. This result is subsequently applied to the problem of approximately representing $$f\in \mathcal {S}$$
f
∈
S
to within accuracy $$\varepsilon $$
ε
by a (quantized) neural network with at most W nonzero weights. We show that for any $$s > s^*(\mathcal {S})$$
s
>
s
∗
(
S
)
there are constants c, C such that, no matter what kind of “learning” procedure is used to produce such a network, the probability of success is bounded from above by $$\min \big \{1, 2^{C\cdot W \lceil \log _2 (1+W) \rceil ^2 - c\cdot \varepsilon ^{-1/s}} \big \}$$
min
{
1
,
2
C
·
W
⌈
log
2
(
1
+
W
)
⌉
2
-
c
·
ε
-
1
/
s
}
.