Convolutional neural networks can be trained to perform histology slide classification using weak annotations with multiple instance learning (MIL). However, given the paucity of labeled histology data, direct application of MIL can easily suffer from overfitting and the network is unable to learn rich feature representations due to the weak supervisory signal. We propose to overcome such limitations with a two-stage semi-supervised approach that combines the power of data-efficient self-supervised feature learning via contrastive predictive coding (CPC) and the interpretability and flexibility of regularized attention-based MIL. We apply our two-stage CPC + MIL semi-supervised pipeline to the binary classification of breast cancer histology images. Across five random splits, we report state-of-theart performance with a mean validation accuracy of 95% and an area under the ROC curve of 0.968. We further evaluate the quality of features learned via CPC relative to simple transfer learning and show that strong classification performance using CPC features can be efficiently leveraged under the MIL framework even with the feature encoder frozen.However, direct application of deep MIL to histopathogical image analysis carries many challenges. Notably, it is common to have limited number of slides available for training, especially for rare conditions. This makes it difficult for a MIL network to adequately learn useful feature representations and as a result we found that MIL tends to drastically overfit. Another challenge is the need to process a bag of many instances at a time, usually in a single batch. This makes backpropagation infeasible due to the large size of tissue microarrays and whole slides and the memory constraints of modern GPUs. As a result, patches need to be sampled, resulting in noisy bag labels [12], or the feature network needs to remain fixed during training to save memory.