Designing an algorithm with a singly exponential complexity for computing semi-algebraic triangulations of a given semi-algebraic set has been a holy grail in algorithmic semi-algebraic geometry. More precisely, given a description of a semi-algebraic set S ⊂ R k by a first order quantifier-free formula in the language of the reals, the goal is to output a simplicial complex ∆, whose geometric realization, |∆|, is semi-algebraically homeomorphic to S. In this paper we consider a weaker version of this question. We prove that for any ≥ 0, there exists an algorithm which takes as input a description of a semi-algebraic subset S ⊂ R k given by a quantifier-free first order formula φ in the language of the reals, and produces as output a simplicial complex ∆, whose geometric realization, |∆| is -equivalent to S. The complexity of our algorithm is bounded by (sd) k O( ) , where s is the number of polynomials appearing in the formula φ, and d a bound on their degrees. For fixed , this bound is singly exponential in k. In particular, since -equivalence implies that the homotopy groups up to dimension of |∆| are isomorphic to those of S, we obtain a reduction (having singly exponential complexity) of the problem of computing the first homotopy groups of S to the combinatorial problem of computing the first homotopy groups of a finite simplicial complex of size bounded by (sd) k O( ) .As an application we give an algorithm with singly exponential complexity for computing the persistence barcodes up to dimension (for any fixed ≥ 0), of the filtration of a given semi-algebraic set by the sub-level sets of a given polynomial. Our algorithm is the first algorithm for this problem with singly exponential complexity, and generalizes the corresponding results for computing the Betti numbers up to dimension of semi-algebraic sets with no filtration present.
We present a neural semi-supervised learning model termed Self-Pretraining. Our model is inspired by the classic self-training algorithm. However, as opposed to self-training, Self-Pretraining is threshold-free, it can potentially update its belief about previously labeled documents, and can cope with the semantic drift problem. Self-Pretraining is iterative and consists of two classifiers.In each iteration, one classifier draws a random set of unlabeled documents and labels them. This set is used to initialize the second classifier, to be further trained by the set of labeled documents. The algorithm proceeds to the next iteration and the classifiers' roles are reversed. To improve the flow of information across the iterations and also to cope with the semantic drift problem, Self-Pretraining employs an iterative distillation process, transfers hypotheses across the iterations, utilizes a two-stage training model, uses an efficient learning rate schedule, and employs a pseudo-label transformation heuristic. We have evaluated our model in three publicly available social media datasets. Our experiments show that Self-Pretraining outperforms the existing state-of-the-art semisupervised classifiers across multiple settings. Our code is available at https://github.com/p-karisani/self_pretraining.
Biological pathways play a crucial role in the properties of diseases and are important in drug discovery. Identifying the logical relationships among distinctive phenotypic clusters could reveal possible connections to the underlying pathways. However, this process is challenging since clinical phenotypes are often available through unstructured electronic health records. Moreover, in the absence of a standardized questionnaire, there could be bias among physicians toward selecting certain medical terms. In this article, we develop an efficient pipeline to address these challenges and help practitioners to reveal the pathways associated with the disease. We use topological data analysis and redescriptions and propose a pipeline of four phases: (1) pre-processing the clinical notes to extract the salient concepts, (2) constructing a feature space of the patients to characterize the extracted concepts, (3) leveraging the topological properties to distill the available knowledge and visualize the extracted features, and finally, (4) investigating the bias in the clinical notes of the selected features and identify possible pathways. Our experiments on a publicly available dataset of COVID-19 clinical notes testify that our pipeline can indeed extract meaningful pathways.
Developing an algorithm for computing the Betti numbers of semi-algebraic sets with singly exponential complexity has been a holy grail in algorithmic semi-algebraic geometry and only partial results are known. In this paper we consider the more general problem of computing the image under the homology functor of a semi-algebraic map f : X Ñ Y between closed and bounded semi-algebraic sets. For every fixed ě 0 we give an algorithm with singly exponential complexity that computes bases of the homology groups H i pXq, H i pY q (with rational coefficients) and a matrix with respect to these bases of the induced linear maps H i pf q : H i pXq Ñ H i pY q, 0 ď i ď . We generalize this algorithm to more general (zigzag) diagrams of maps between closed and bounded semi-algebraic sets and give a singly exponential algorithm for computing the homology functors on such diagrams. This allows us to give an algorithm with singly exponential complexity for computing barcodes of semi-algebraic zigzag persistent homology in small dimensions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.