13Tumors are mixtures of different compartments. While global gene expression analysis 14 profiles the average expression of all compartments in a sample, identifying the specific 15 contribution of each compartment remains a challenge. With the increasing recognition 16 of the importance of non-neoplastic components, the ability to breakdown the gene 17 expression contribution of each is critical. To this end, we developed DECODER, an 18 integrated framework which performs de novo deconvolution, and compartment weight 19 estimation for a single sample. We use DECODER to deconvolve 33 TCGA tumor RNA-20 seq datasets and show that it may be applied to other data types including ATAC-seq. 21 We demonstrate that it can be utilized to reproducibly estimate cellular compartment 22 weights in pancreatic cancer that are clinically meaningful. Application of DECODER 23 across cancer types advances the capability of identifying cellular compartments in an 24 unknown sample and may have implications for identifying the tumor of origin for cancers 25 of unknown primary. 26 27 28 Tumor samples are mixtures of distinct cell populations that contribute to intra-tumor 29 heterogeneity, including immune, stroma and normal cells 1,2 . Therefore, with bulk tumor 30 samples, the analysis of tumor gene expression can be significantly confounded by the 31 presence of non-neoplastic cell types, while the contribution of the tumor 32 microenvironment is difficult to separate. Although laser-capture microdissection (LCM) 33 and single cell sequencing techniques strive to tackle these problems, both of them 34 present certain limitations. LCM is labor-intensive and may influence the quality of the 35 microdissected tissue for further analysis 3,4 . Single-cell sequencing is still expensive, 36 computing resource heavy, and currently limited by the lack of comprehensive cell-sorting 37 biomarkers 5,6 . 38 To eliminate the need of relying on LCM or single-cell-based techniques, a plethora of 39 computational strategies have been developed to deconvolve the mixed signal present in 40 a bulk tumor sample using RNA gene expression, DNA copy number data or DNA 41 methylation data. Algorithms based on DNA copy-number alterations, e.g. ABSOLUTE 7 , 42 and DNA methylation profiles, e.g. MethylPurify 8 and InfiniumPurify 9 , focus on inferring 43 tumor purity, while expression-based deconvolution methods mainly handle estimation of 44 compartment fractions, as well as extraction of compartment-specific expression profiles 2 . 45 However, the current expression-based deconvolution methods still pose a number of 46 limitations. Some methods are limited to the presupposition of a certain combination of 47 compartments, such as DeMix (tumor and normal) 10 , UNDO (tumor and stroma) 11 and 48 ESTIMATE (tumor, stroma and immune) 12 . Other methods such as DeconRNAseq 13 and 49 CIBERSORT 14 , provide the flexibility to measure any number of specific compartments. 50 However, they require knowledge of the pure expression of compartments as the 51 refe...