Program autotuning has been shown to achieve better or more portable performance in a number of domains. However, autotuners themselves are rarely portable between projects, for a number of reasons: using a domain-informed search space representation is critical to achieving good results; search spaces can be intractably large and require advanced machine learning techniques; and the landscape of search spaces can vary greatly between different problems, sometimes requiring domain specific search techniques to explore efficiently. This paper introduces OpenTuner, a new open source framework for building domain-specific multi-objective program autotuners. OpenTuner supports fully-customizable configuration representations, an extensible technique representation to allow for domain-specific techniques, and an easy to use interface for communicating with the program to be autotuned. A key capability inside OpenTuner is the use of ensembles of disparate search techniques simultaneously; techniques that perform well will dynamically be allocated a larger proportion of tests. We demonstrate the efficacy and generality of OpenTuner by building autotuners for 7 distinct projects and 16 total benchmarks, showing speedups over prior techniques of these projects of up to 2.8× with little programmer effort.
Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide's state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware.The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regenerate the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. We abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions.Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75% performance improvement, four kernels from IrfanView, leading to 4.97× performance, and one stencil from the miniGMG multigrid benchmark netting a 4.25× improvement in performance. We manually rejuvenated Photoshop by replacing eleven of Photoshop's filters with our lifted implementations, giving 1.12× speedup without affecting the user experience.
Abstract:We prove the computational intractability of rotating and placing n square tiles into a 1 × n array such that adjacent tiles are compatible-either equal edge colors, as in edge-matching puzzles, or matching tab/pocket shapes, as in jigsaw puzzles. Beyond basic NP-hardness, we prove that it is NP-hard even to approximately maximize the number of placed tiles (allowing blanks), while satisfying the compatibility constraint between nonblank tiles, within a factor of 0.9999999702 (On the other hand, there is an easy 1 2 -approximation). This is the first (correct) proof of inapproximability for edge-matching and jigsaw puzzles. Along the way, we prove NP-hardness of distinguishing, for a directed graph on n nodes, between having a Hamiltonian path (length n − 1) and having at most 0.999999284(n − 1) edges that form a vertex-disjoint union of paths. We use this gap hardness and gap-preserving reductions to establish similar gap hardness for 1 × n jigsaw and edge-matching puzzles.
Abstract. We prove that it is NP-hard to dissect one simple orthogonal polygon into another using a given number of pieces, as is approximating the fewest pieces to within a factor of 1 + 1/1080 − ε.
We prove that path puzzles with complete row and column information-or equivalently, 2D orthogonal discrete tomography with Hamiltonicity constraint-are strongly NP-complete, ASPcomplete, and #P-complete. Along the way, we newly establish ASP-completeness and #Pcompleteness for 3-Dimensional Matching and Numerical 3-Dimensional Matching.
Abstract. Nintendo's Mario Kart is perhaps the most popular racing video game franchise. Players race alone or against opponents to finish in the fastest time possible. Players can also use items to attack and defend from other racers. We prove two hardness results for generalized Mario Kart: deciding whether a driver can finish a course alone in some given time is NP-hard, and deciding whether a player can beat an opponent in a race is PSPACE-hard.
We analyze the computational complexity of several new variants of edge-matching puzzles. First we analyze inequality (instead of equality) constraints between adjacent tiles, proving the problem NP-complete for strict inequalities but polynomial-time solvable for nonstrict inequalities. Second we analyze three types of triangular edge matching, of which one is polynomial-time solvable and the other two are NP-complete; all three are #P-complete. Third we analyze the case where no target shape is specified and we merely want to place the (square) tiles so that edges match exactly; this problem is NP-complete. Fourth we consider four 2-player games based on 1×n edge matching, all four of which are PSPACE-complete. Most of our NP-hardness reductions are parsimonious, newly proving #P and ASP-completeness for, e.g., 1 × n edge matching. Along the way, we prove #P-and ASP-completeness of planar 3-regular directed Hamiltonicity; we provide linear-time algorithms to find antidirected and forbidden-transition Eulerian paths; and we characterize the complexity of new partizan variants of the Geography game on graphs.
We analyze the computational complexity of the many types of pencil-and-paper-style puzzles featured in the 2016 puzzle video game The Witness. In all puzzles, the goal is to draw a path in a rectangular grid graph from a start vertex to a destination vertex. The different puzzle types place different constraints on the path: preventing some edges from being visited (broken edges); forcing some edges or vertices to be visited (hexagons); forcing some cells to have certain numbers of incident path edges (triangles); or forcing the regions formed by the path to be partially monochromatic (squares), have exactly two special cells (stars), or be singly covered by given shapes (polyominoes) and/or negatively counting shapes (antipolyominoes). We show that any one of these clue types (except the first) is enough to make path finding NP-complete ("witnesses exist but are hard to find"), even for rectangular boards. Furthermore, we show that a final clue type (antibody), which necessarily "cancels" the effect of another clue in the same region, makes path finding Σ 2-complete ("witnesses do not exist"), even with a single antibody (combined with many anti/polyominoes), and the problem gets no harder with many antibodies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.