Exploiting spatial redundancy in images is responsible for a large gain in the performance of image and video compression. The main tool to achieve this is called intra-frame prediction. In most state-of-the-art video coders, intra prediction is applied in a block-wise fashion. Up to now angular prediction was dominant, providing a low-complexity method covering a large variety of content. With deep learning, however, it is possible to create prediction methods covering a wider range of content, being able to predict structures which traditional modes can not predict accurately. Using the conditional autoencoder structure, we are able to train a single artificial neural network which is able to perform multi-mode prediction. In this paper, we derive the approach from the general formulation of the intraprediction problem and introduce two extensions for spatial mode prediction and for chroma prediction support. Moreover, we propose a novel latent-space-based cross component prediction. We show the power of our prediction scheme with visual examples and report average gains of 1.13% in Bjøntegaard delta rate in the luma component and 1.21% in the chroma component compared to VTM using only traditional modes.