Data-driven machine learning force fields (MLF) are more and more popular in atomistic simulations, and exploit machine learning methods to predict energies and forces for unknown structures based on the knowledge learned from an existing reference database. The latter usually comes from density functional theory calculations. One main drawback of MLFs is that physical laws are not incorporated in the machine learning models and instead, MLFs are designed to be very flexible to simulate complex quantum chemistry potential energy surface (PES). In general, MLFs have poor transferability, and hence a very large trainset is required to span all the target feature space to get a reliable MLF. This procedure becomes more troublesome when the PES is complicated, with a large number of degrees of freedom, in which building a large database is inevitable and very expensive, especially when accurate but costly exchange-correlation functionals have to be used. In this manuscript, we exploit a high dimensional neural network potential (HDNNP) on Pt clusters of size 6 to 20 as one example. Our standard level of energy calculation is DFT GGA (PBE) using a plane wave basis set. We introduce an approximate but fast level with the PBE functional and a minimal atomic orbital basis set, then a more accurate but expensive level, using a hybrid functional or non-local vdw functional and a plane wave basis set, is reliably predicted by learning the difference with HDNNP. The results show that such a differential approach (named ΔHDNNP) can deliver very accurate predictions (error < 10 2 meV/atom) in reference to converged basis set energies as well as more accurate but expensive xc functional. The overall speedup can be as large as 900 for 20 atom Pt cluster. More importantly, ΔHDNNP shows much better transferability due to the intrinsic smoothness of delta potential energy surface, and accordingly one can use much smaller trainset data to obtain better accuracy than the conventional HDNNP. A multi-layer ΔHDNNP is thus proposed to obtain very accurate predictions versus expensive non-local vdW functional calculations in which the required trainset is further reduced. The approach can be easily generalized to any other machine learning methods and opens a path to study the structure and dynamics of Pt clusters and nanoparticles.