16Motivation: We participated in the DREAM Single Cell Transcriptomics Challenge. The 17 challenge's focus was two-fold; a) to identify the top 60, 40 and 20 genes that contain the most 18 spatial information, and b) to reconstruct the 3-D arrangement of the D. melanogaster embryo 19 using information from those genes. 20Results: We developed two independent approaches, leveraging machine learning models from 21 Lasso and Deep Neural Networks, that we successfully apply to high-dimensional single-cell 22 sequencing data. Our methods allowed us to achieve top performance when compared to the 23 ground truth. Among ~40 participating teams, the resulting solutions placed 10th, 6th, and 4th in 24 the three DREAM sub-challenges #1, #2 and #3, respectively. Notably, for the Lasso approach 25 we introduced a feature selection technique, Lasso-TopX, that allows a user to define a specific 26 number of features they are interested in and the Neural Network approach utilizes weak 27 supervision for linear regression to accommodate for uncertain or probabilistic training labels. 28 Furthermore, we identified novel D. melanogaster genes that carry important positional 29 information and were not previously suspected. Lastly, we show how the indirect use of the full 30 datasets' information can lead to data leakage and generate bias in overestimating the model's 31 performance. 32 Availability: https://github.com/TJU-CMC-Org/SingleCell-DREAM/. 33 Contact: Nestoras.Karathanasis@jefferson.edu 34 35 36 48melanogaster embryo as a model system and seek to determine whether one can reconstruct the 49 spatial arrangement of cells from a stage 6 embryo by only using a limited number of genes. The 50 challenge piggy backed off previously published scRNA-seq datasets and a computational 51 mapping strategy called DistMap, that leveraged in-situ hybridization data from 84 genes of the 52 Berkeley Drosophila Transcription Network Project (BDTNP), which was shown to uniquely 53 classify almost every position of the D. melanogaster embryo (Karaiskos et al., 2017). Out of 54 these 84 genes (herein referred to as "inSitu genes") and without using hybridization data, the 55 participants were asked to identify the most informative 60, 40, and 20 genes for subchallenges 56 #1, #2, and #3 respectively. In addition to gene selection, each subchallenge also required 57 participants to submit 10 locations predictions (X, Y, Z coordinates) for each of the cells using 58 only the selected genes (Tanevski et al., 2019) . 59• Spatial coordinates: X, Y, and Z coordinates were supplied for the 3039 locations of 97 the D. melanogaster embryo. 98• Single cell RNA sequencing: Three expression tables were provided; the raw, 99 normalized, and binarized expression of 8924 genes across 1297 cells (Karaiskos et 100 al., 2017). 101• DistMap source code was provided and it was used to identify the cell locations in the 102 initial publication (Karaiskos et al., 2017). the spatial coordinates 110Briefly, DistMap calculates several parameters, a quantile value a...