Deep learning based image quality assessment models usually learn to predict image quality from a single dataset, leading the model to overfit specific scenes. To account for this, mixed datasets training can be an effective way to enhance the generalization capability of the model. However, it is nontrivial to combine different image quality assessment datasets, as their quality evaluation criteria, score ranges, view conditions, as well as subjects are usually not shared during the image quality annotation. Instead of aligning the annotations, this paper proposes a monotonic neural network for image quality assessment model learning with different datasets combined. In particular, this model consists of a dataset‐shared quality regressor and several dataset‐specific quality transformers. The quality regressor aims to obtain the perceptual quality of each image of each dataset and the quality transformer maps the perceptual quality to the corresponding annotation monotonically. The experimental results verify the effectiveness of the proposed learning strategy and the code is available at https://github.com/fzp0424/MonotonicIQA.