ObjectivePancreatic ductal adenocarcinoma (PDA) has among the highest stromal fractions of any cancer and this has complicated attempts at expression-based molecular classification. The goal of this work is to profile purified samples of human PDA epithelium and stroma and examine their respective contributions to gene expression in bulk PDA samples.DesignWe used laser capture microdissection (LCM) and RNA sequencing to profile the expression of 60 matched pairs of human PDA malignant epithelium and stroma samples. We then used these data to train a computational model that allowed us to infer tissue composition and generate virtual compartment-specific expression profiles from bulk gene expression cohorts.ResultsOur analysis found significant variation in the tissue composition of pancreatic tumours from different public cohorts. Computational removal of stromal gene expression resulted in the reclassification of some tumours, reconciling functional differences between different cohorts. Furthermore, we established a novel classification signature from a total of 110 purified human PDA stroma samples, finding two groups that differ in the extracellular matrix-associated and immune-associated processes. Lastly, a systematic evaluation of cross-compartment subtypes spanning four patient cohorts indicated partial dependence between epithelial and stromal molecular subtypes.ConclusionOur findings add clarity to the nature and number of molecular subtypes in PDA, expand our understanding of global transcriptional programmes in the stroma and harmonise the results of molecular subtyping efforts across independent cohorts.
A sunflower with r petals is a collection of r sets so that the intersection of each pair is equal to the intersection of all. Erdős and Rado proved the sunflower lemma: for any fixed r, any family of sets of size w, with at least about w w sets, must contain a sunflower. The famous sunflower conjecture is that the bound on the number of sets can be improved to c w for some constant c. In this paper, we improve the bound to about (log w) w . In fact, we prove the result for a robust notion of sunflowers, for which the bound we obtain is tight up to lower order terms.
We study an extension of active learning in which the learning algorithm may ask the annotator to compare the distances of two examples from the boundary of their label-class. For example, in a recommendation system application (say for restaurants), the annotator may be asked whether she liked or disliked a specific restaurant (a label query); or which one of two restaurants did she like more (a comparison query).We focus on the class of half spaces, and show that under natural assumptions, such as large margin or bounded bit-description of the input examples, it is possible to reveal all the labels of a sample of size n using approximately O(log n) queries. This implies an exponential improvement over classical active learning, where only label queries are allowed. We complement these results by showing that if any of these assumptions is removed then, in the worst case, Ω(n) queries are required.Our results follow from a new general framework of active learning with additional queries. We identify a combinatorial dimension, called the inference dimension, that captures the query complexity when each additional query is determined by O(1) examples (such as comparison queries, each of which is determined by the two compared examples). Our results for half spaces follow by bounding the inference dimension in the cases discussed above.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.