Significant progress has been made in applying deep learning on natural language processing tasks recently. However, deep learning models typically require a large amount of annotated training data while often only small labeled datasets are available for many natural language processing tasks in biomedical literature. Building large-size datasets for deep learning is expensive since it involves considerable human effort and usually requires domain expertise in specialized fields. In this work, we consider augmenting manually annotated data with large amounts of data using distant supervision. However, data obtained by distant supervision is often noisy, we first apply some heuristics to remove some of the incorrect annotations. Then using methods inspired from transfer learning, we show that the resulting models outperform models trained on the original manually annotated sets. 14 annotated as positive or negative depending on whether that sentence expresses a 15 relation of interest among the marked entities. Many traditional (non-deep learning) 16 machine learning methods have been applied on these problems (see e.g. [4] [5] [6] [7]) 17 with most of them being feature-based or kernel-based methods. However, 18 April 26, 2019 1/14 features/kernels have to be manually designed and their performance are not up to par 19 with deep learning models when there is sufficient data. 20 Recently, deep learning methods show great advancement in various NLP tasks. 21 Convolutional neural network and recurrent neural network are two well-studied types 22 of deep learning architecture in NLP field. Promising results have been achieved by 23 CNN model [8] [9] and current state-of-art CNN systems on relation extraction usually 24 utilize refined architecture to incorporate more lexical and syntactic information. In [2], 25 they applied piecewise max pooling process after convolutional layer to extract the 26 structural features between the entities. The proposed method (piecewise CNN) 27 exhibits superior performance compared with pure CNN. Peng et al. [10] proposed 28 multiple channels in CNN to incorporate the syntactic dependency information and 29 better capture longer distance dependencies. Also, RNN model shows its advantage on 30 relation extraction, the model in [11] achieves state-of-the-art results on protein-protein 31 interaction (PPI) task only using the word embedding as the input of LSTM model. 32 However, each new task requires its own annotated data for training the deep 33 learning model. The annotation process of data needs considerable human effort to put 34 a label on each data instance and often requires domain expertise, especially in 35 specialized fields like Biomedicine. This issue is particularly onerous with deep learning 36 since the models require setting of a large number of parameters and hence typically 37 require large datasets. Currently, only small datasets are available for a number of tasks 38 and this situation can hinder us from achieving the full potential of deep learning 39 models. In...