11 (215) 898-8007 12 Department of Statistics 13The Wharton School 14University of Pennsylvania 15 16While single cell RNA sequencing (scRNA-seq) is invaluable for studying cell 17 populations, cell-surface proteins are often integral markers of cellular function and 18 serve as primary targets for therapeutic intervention. Here we propose a transfer learning 19 framework, single cell Transcriptome to Protein prediction with deep neural network 20 (cTP-net), to impute surface protein abundances from scRNA-seq data by learning from 21 existing single-cell multi-omic resources. 22 multi-omics 24 through the REAP-seq and CITE-seq protocols 2, 3 . Cell surface proteins can serve as integral 29 markers of specific cellular functions and primary targets for therapeutic intervention. 30Immunophenotyping by cell surface proteins has been an indispensable tool in hematopoiesis, 31 immunology and cancer research during the past 30 years. Yet, due to technological barriers 32 and cost considerations, most single cell studies, including Human Cell Atlas project 6 , quantify 33 the transcriptome only and do not have cell-matched measurements of relevant surface proteins 34 7, 8 . Sometimes, which cell types and corresponding surface proteins are essential become 35 apparent only after exploration by scRNA-seq. This motivates our inquiry of whether protein 36 abundances in individual cells can be accurately imputed by the cell's transcriptome. 37We propose cTP-net (single cell Transcriptome to Protein prediction with deep neural network), 38 a transfer learning approach based on deep neural networks that imputes surface protein 39 abundances for scRNA-seq data. Through comprehensive benchmark evaluations and 40 applications to Human Cell Atlas and acute myeloid leukemia data sets, we show that cTP-net 41 outperform existing methods and can transfer information from training data to accurately 42 impute 24 immunophenotype markers, which achieve a more detailed characterization of 43 cellular state and cellular phenotypes than transcriptome measurements alone. cTP-net relies, 44 for model training, on accumulating public data of cells with paired transcriptome and surface 45 protein measurements. 46 47 Results 48 Method overview 49 An overview of cTP-net is shown in Figure 1a. Studies based on both CITE-seq and REAP-seq 50 have shown that the relative abundance of most surface proteins, at the level of individual cells, 51is only weakly correlated with the relative abundance of the RNA of its corresponding gene 2, 3, 9 . 52 This is due to technical factors such as RNA and protein measurement error 10 , as well as 53 inherent stochasticity in RNA processing, translation and protein transport [11][12][13][14][15] . To accurately 54 impute surface protein abundance from scRNA-seq data, cTP-net employs two steps: (1) 55 denoising of the scRNA-seq count matrix and (2) imputation based on the denoised data 56 through a transcriptome-protein mapping (Figure 1a). The initial denoising, by SAVER-X 16 , 57 produces more accura...