Abstract-Given a collection of web images with the corresponding textual descriptions, in this paper, we propose a novel cross-domain learning method to classify these web multimedia objects by transferring the correlation knowledge among different information sources. Here, the knowledge is extracted from unlabeled objects through unsupervised learning and applied to perform supervised classification tasks. To mine more meaningful correlation knowledge, instead of using commonly used visual words in the traditional bag-of-visual-words (BoW) model, we discover higher level visual components (words and phrases) to incorporate the spatial and semantic information into our image representation model, i.e., bag-of-visual-phrases (BoP). By combining the enriched visual components with the textual words, we calculate the frequently co-occurring pairs among them to construct a cross-domain correlated graph in which the correlation knowledge is mined. After that, we investigate two different strategies to apply such knowledge to enrich the feature space where the supervised classification is performed. By transferring such knowledge, our cross-domain transfer learning method can not only handle large scale web multimedia objects, but also deal with the situation that the textual descriptions of a small portion of web images are missing. Empirical experiments on two different datasets of web multimedia objects are conducted to demonstrate the efficacy and effectiveness of our proposed cross-domain transfer learning method.