Knowledge of genes that are critical to a tissue’s function remains difficult to ascertain and presents a major bottleneck toward a mechanistic understanding of genotype-phenotype links. Here, we present the first machine learning model–FUGUE–combining transcriptional and network features, to predict tissue-relevant genes across 30 human tissues. FUGUE achieves an average cross-validation auROC of 0.86 and auPRC of 0.50 (expected 0.09). In independent datasets, FUGUE accurately distinguishes tissue or cell type-specific genes, significantly outperforming the conventional metric based on tissue-specific expression alone. Comparison of tissue-relevant transcription factors across tissue recapitulate their developmental relationships. Interestingly, the tissue-relevant genes cluster on the genome within topologically associated domains and furthermore, are highly enriched for differentially expressed genes in the corresponding cancer type. We provide the prioritized gene lists in 30 human tissues and an open-source software to prioritize genes in a novel context given multi-sample transcriptomic data.
We discuss methods for visualizing neural network decision boundaries and decision regions. We use these visualizations to investigate issues related to reproducibility and generalization in neural network training. We observe that changes in model architecture (and its associate inductive bias) cause visible changes in decision boundaries, while multiple runs with the same architecture yield results with strong similarities, especially in the case of wide architectures. We also use decision boundary methods to visualize double descent phenomena. We see that decision boundary reproducibility depends strongly on model width. Near the threshold of interpolation, neural network decision boundaries become fragmented into many small decision regions, and these regions are non-reproducible. Meanwhile, very narrows and very wide networks have high levels of reproducibility in their decision boundaries with relatively few decision regions. We discuss how our observations relate to the theory of double descent phenomena in convex models. Code is available at https://github.com/ somepago/dbViz .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.