Over the last twenty years, an exciting interplay has emerged between proof systems and algorithms. Some natural families of algorithms can be viewed as a generic translation from a proof that a solution exists into an algorithm for finding the solution itself. This connection has perhaps been the most consequential in the context of semi-algebraic proof systems and basic primitives in algorithm design such as linear and semidefinite programming. The proof system perspective, in this context, has provided fundamentally new tools for both algorithm design and analysis. These news tools have helped in both designing better algorithms for well-studied problems and proving tight lower bounds on such techniques.
The random k-SAT model is the most important and well-studied distribution over k-SAT instances. It is closely connected to statistical physics; it is used as a testbench for satisfiablity algorithms, and lastly average-case hardness over this distribution has also been linked to hardness of approximation via Feige's hypothesis. In this paper, we prove that any Cutting Planes refutation for random k-SAT requires exponential size, for k that is logarithmic in the number of variables, and in the interesting regime where the number of clauses guarantees that the formula is unsatisfiable with high probability.makes random d-SAT an important family of formulas for propositional proof complexity, since superpolynomial lower bounds for random d-SAT formulas in a particular proof system show that any complete and efficient algorithm based on the proof system will perform badly on random d-SAT instances. Furthermore, since the proof complexity lower bounds hold in the unsatisfiable regime, they are directly connected to Feige's hypothesis.Remarkably, determining whether or not a random SAT instance from the distribution F (m, n, d) is satisfiable is controlled quite precisely by the ratio ∆ = m/n, which is called the clause density. A simple counting argument shows that F (m, n, d) is unsatisfiable with high probability for ∆ > 2 d ln 2. The famous satisfiability threshold conjecture asserts that there is a constant c d such that random d-SAT formulas of clause density ∆ are almost certainly satisfiable for ∆ < c d and almost certainly unsatisfiable if ∆ > c d , where c d is roughly 2 d ln 2.In a major recent breakthrough, the conjecture was resolved for large values of d [11].From the perspective of proof complexity, the density parameter ∆ also plays an important role in the difficulty of refuting unsatisfiable CNF formulas. For instance, in Resolution, which is arguably the simplest proof system, the complexity of refuting random d-SAT formulas is now very well understood in terms of ∆. In a seminal paper, Chvatal and Szemeredi [10] showed that for any fixed ∆ above the threshold there is a constant κ ∆ such that random d-SAT requires size exp(κ ∆ n) Resolution refutations with high probability. In their proof, the drop-off in κ ∆ is doubly exponential in ∆, making the lower bound trivial when the number of clauses is larger than n log 1/4 n (and thus does not hold when d is large.) Improved lower bounds [5,7] proved that the drop-off in κ ∆ is at most polynomial in ∆. More precisely, they prove that a random d-SAT formula with at most n (d+2)/4 clauses requires exponential size Resolution refutations. Thus for all values of d, even when the number of clauses is way above the threshold, Resolution refutations are exponentially long. They also give asymptotically matching upper bounds, showing that there are DLL refutations of size exp(n/∆ 1/(d−2) ).Superpolynomial lower bounds for random d-SAT formulas are also known for other weak proof systems such as the polynomial calculus and Res(k) [1,6], and random d-SAT is also...
A large number of human diseases result from disruptions to protein structure and function caused by missense mutations. Computational methods are frequently employed to assist in the prediction of protein stability upon mutation. These methods utilize a combination of protein sequence data, protein structure data, empirical energy functions, and physicochemical properties of amino acids. In this work, we present the first use of dynamic protein structural features in order to improve stability predictions upon mutation. This is achieved through the use of a set of timeseries extracted from microsecond timescale atomistic molecular dynamics simulations of proteins. Standard machine learning algorithms using mean, variance, and histograms of these timeseries were found to be 60-70% accurate in stability classification based on experimental G or protein-chaperone interaction measurements. A recurrent neural network with full treatment of timeseries data was found to be 80% accurate according the F1 score. The performance of our models was found to be equal or better than two recently developed machine learning methods for binary classification as well as two industry-standard stability prediction algorithms. In addition to classification, understanding the molecular basis of protein stability disruption due to disease-causing mutations is a significant challenge that impedes the development of drugs and therapies that may be used treat genetic diseases. The use of dynamic structural features allows for novel insight into the molecular basis of protein disruption by mutation in a diverse set of soluble proteins. To assist in the interpretation of machine learning results, we present a technique for determining the importance of features to a recurrent neural network using Garson's method. We propose a novel extension of neural interpretation diagrams by implementing Garson's method to scale each node in the neural interpretation diagram according to its relative importance to the network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.