In the last few years, with the development of deep learning theory, researchers have tried to introduce the method of artificial intelligence into the field of software defect prediction (SDP) to improve its prediction effect. To be fed into the neural network, the sample codes are represented as an abstract syntax tree (AST), and the AST is encoded as real numbers. However, in most cross-project defect prediction (CPDP) task, the method for converting the AST into a real number cannot effectively estimate the semantic distance between the ASTs, resulting in a significant reduction in training effects. To solve that problem, we present a new encoding framework, tree-based-embedding (TBE), to convert AST into real vectors and make the semantic gap between the ASTs measurable. To estimate the effect of this encoding method, we promise a tree-based-embedding convolutional neural network with transferable hybrid feature learning (TBCNN-THFL) to perform the CPDP tasks. TBCNN-THFL is fed data encoded with TBE method for learning the transferable joint features between different projects; meanwhile, TBCNN-THFL introduces a transfer component analysis algorithm. Furthermore, the model combines the handcrafted and deep-learninggenerated features and then feeds them into the classifier to train a defect prediction model. A sufficient number of experiments demonstrate that TBCNN-THFL is superior to referential models on 72 pairs of CPDP tasks formed by 9 open-source projects. INDEX TERMS Software engineering, software defect prediction, cross project defect prediction, deep learning, continuous bag-of-word, transfer component algorithm.