In the digital era, the data, for a given analytical task, can be collected in different formats, such as text, images and audio etc. The data with multiple formats are called multimedia data. Integrating and fusing multimedia datasets has become a challenging task in machine learning and data mining. In this paper, we present heterogeneous ensemble method that combines multi-media datasets at the decision level. Our method consists of several components, including extracting the features from multimedia datasets that are not represented by features, modelling independently on each of multimedia datasets, selecting models based on their accuracy and diversity and building the ensemble at the decision level. Hence our method is called decision level ensemble method (DLEM). The method is tested on multimedia data and compared with other heterogeneous ensemble based methods. The results show that the DLEM outperformed these methods significantly.