Comparative evaluation is a requirement for reproducible science and objective assessment of new algorithms. Reproducible research in the field of pansharpening of very high resolution images is a difficult task due to the lack of openly available reference datasets and protocols. The contribution of this work is three-fold and it defines a benchmarking framework to evaluate pansharpening algorithms. First, it establishes a reference dataset, named PAirMax, composed of 14 panchromatic and multispectral image pairs collected over heterogeneous landscapes by different satellites. Second, it standardizes various image pre-processing steps, such as filtering, upsampling, and band co-registration, by providing a reference implementation. Third, it details the quality assessment protocols for reproducible algorithm evaluation.