12Reliable identification of brain cell types is necessary for studying brain cell 13 biology. Many brain cell marker genes have been proposed, but their reliability 14 has not been fully validated. We evaluated 540 commonly-used marker genes 15 of astrocyte, microglia, neuron, and oligodendrocyte with six transcriptome and 16 proteome datasets from purified human and mouse brain cells (n=125). By 17 setting new criteria of cell-specific fold change, we identified 22 gold standard 18 marker genes (GSM) with stable cell-specific expression. Our results call into 19 question the specificity of many proposed marker genes. We used two single-20 cell transcriptome datasets from human and mouse brains to explore the co-21 expression of marker genes (n=3337). The mouse co-expression modules were 22 perfectly preserved in human transcriptome, but the reverse was not. Also, we 23 proposed new criteria for identifying marker genes based on both differential 24 expression and co-expression data. We identified 16 novel candidate marker 25 genes (NCM) for mouse and 18 for human independently, which have the 26 potential for use in cell sorting or other tagging techniques. We validated the 27 specificity of GSM and NCM by in-silico deconvolution analysis. Our systematic 28 evaluation provides a list of credible marker genes to facilitate correct cell 29 identification, cell labeling, and cell function studies. 30 31 development of marker genes, which are sets of genes that express specifically 41 in a cell type. Thousands of genes have been proposed as marker genes 2 . One 42 well-known marker gene, RBFOX3 (gene of NeuN), is only expressed in nuclei 43 of most neuronal cell types 3 . Marker genes can be used in several applications.
44Protein products of marker genes can be used to label different cell types, which 45 may be used in fluorescence activated cell sorting (FACS). Marker genes also 46 can be used to determine cell composition in bulk tissue samples. A 47 computational method known as supervised deconvolution was developed to 48 infer cell proportions in bulk tissue samples based on the expression of marker 49 genes 4-6 . This method has been applied to studying the composition of bulk 50 brain samples 7,8 . High specificity of marker genes is critical for generating 51 reliable results in all of these applications.
52Differential gene expression (DGE) analysis of transcriptome or proteome 53 data is the most straightforward way to define the specificity of marker genes 9-54 15 . One of the drawbacks of DGE is that the outcomes is study-dependent. The 55 outcomes are affected by many factors such as species, cell or tissue source, 56 and the data generation platform. Human and mouse genomes are 80% 57 orthologous 16 , but differences in gene expression between species are often 58 greater than those between tissues within one species 17 . Within a species, cells 59 isolated from primary culture or acutely from tissue showed different gene 60 expression patterns 18 . Also, the expression estimates of the mar...