The activities of most enzymes and drugs rely on interactions between proteins and small molecules. Accurate predictions of these interactions could accelerate pharmaceutical and biotechnological research massively. Machine learning models designed for this task are currently limited by the lack of information exchange between the protein and the small molecule during the creation of the required numerical representations. Here, we introduce ProSmith, a machine learning framework that employs a multimodal Transformer Network to simultaneously process protein amino acid sequences and small molecule strings in the same input. This approach facilitates the exchange of all relevant information between the two molecule types during the calculation of their numerical representations, allowing the model to account for their structural and functional interactions. Our final model combines gradient boosting predictions based on the resulting multimodal Transformer Network with independent predictions based on separate deep learning representations of the proteins and small molecules. The corresponding predictions outperform all previous models for predicting protein-small molecule interactions across three diverse tasks: predicting Michaelis constantsKM; inferring potential substrates for enzymes; and predicting protein-drug affinities. The Python code provided can be used to easily implement and improve machine learning predictions involving arbitrary protein-small molecule interactions.