Despite previous efforts to relate concrete proportioning and strength, a robust knowledgebased model for accurate concrete strength predictions is still lacking. As an alternative to physical or chemical-based models, machine learning (ML) methods offer a new solution to this problem. Although ML can handle the complex, non-linear, non-additive relationship between concrete mixture proportions and strength, it requires large datasets. This is a concern as reliable strength data is rather limited, especially for industrial concretes. Here, based on a large dataset (>10,000 observations) of measured compressive strengths from industrially-produced concretes, we compare the ability of select ML algorithms to "learn" how to reliably predict concrete strength as a function of the size of the dataset. Based on these results, we discuss the competition between how accurate a given model can eventually be (when trained on a large dataset) and how much data is actually required to train this model.