Diffractive optics finds attractive applications in the development of optical data storage systems. The performance of these systems is measured by their storage capacity specifying the amount of information that can be stored. In this paper information theory is applied to derive statements on the capacity of optical storage systems that make use of pixelated paraxial diffractive elements. It is motivated that the storage capacity of diffractive elements is directly correlated with the amount of information that can be correctly transmitted via a noisy communication channel consisting of the storage medium, free space, and a detector. Examinations on the storage capacity are restricted to noise effects due to the modulation characteristics of the storage medium. In this context redundant encoding schemes are shown to be useful to optimize the information capacity. Finally, in an experiment the storage capacity of a storage system with phase modulating diffractive elements is derived using the proposed methods.