Noah Fleming scite author profile

Over the last twenty years, an exciting interplay has emerged between proof systems and algorithms. Some natural families of algorithms can be viewed as a generic translation from a proof that a solution exists into an algorithm for finding the solution itself. This connection has perhaps been the most consequential in the context of semi-algebraic proof systems and basic primitives in algorithm design such as linear and semidefinite programming. The proof system perspective, in this context, has provided fundamentally new tools for both algorithm design and analysis. These news tools have helped in both designing better algorithms for well-studied problems and proving tight lower bounds on such techniques.

show abstract

Random Θ(log n)-CNFs Are Hard for Cutting Planes

Fleming

Pankratov

Pitassi

et al. 2017

View full text Add to dashboard Cite

The random k-SAT model is the most important and well-studied distribution over k-SAT instances. It is closely connected to statistical physics; it is used as a testbench for satisfiablity algorithms, and lastly average-case hardness over this distribution has also been linked to hardness of approximation via Feige's hypothesis. In this paper, we prove that any Cutting Planes refutation for random k-SAT requires exponential size, for k that is logarithmic in the number of variables, and in the interesting regime where the number of clauses guarantees that the formula is unsatisfiable with high probability.makes random d-SAT an important family of formulas for propositional proof complexity, since superpolynomial lower bounds for random d-SAT formulas in a particular proof system show that any complete and efficient algorithm based on the proof system will perform badly on random d-SAT instances. Furthermore, since the proof complexity lower bounds hold in the unsatisfiable regime, they are directly connected to Feige's hypothesis.Remarkably, determining whether or not a random SAT instance from the distribution F (m, n, d) is satisfiable is controlled quite precisely by the ratio ∆ = m/n, which is called the clause density. A simple counting argument shows that F (m, n, d) is unsatisfiable with high probability for ∆ > 2 d ln 2. The famous satisfiability threshold conjecture asserts that there is a constant c d such that random d-SAT formulas of clause density ∆ are almost certainly satisfiable for ∆ < c d and almost certainly unsatisfiable if ∆ > c d , where c d is roughly 2 d ln 2.In a major recent breakthrough, the conjecture was resolved for large values of d [11].From the perspective of proof complexity, the density parameter ∆ also plays an important role in the difficulty of refuting unsatisfiable CNF formulas. For instance, in Resolution, which is arguably the simplest proof system, the complexity of refuting random d-SAT formulas is now very well understood in terms of ∆. In a seminal paper, Chvatal and Szemeredi [10] showed that for any fixed ∆ above the threshold there is a constant κ ∆ such that random d-SAT requires size exp(κ ∆ n) Resolution refutations with high probability. In their proof, the drop-off in κ ∆ is doubly exponential in ∆, making the lower bound trivial when the number of clauses is larger than n log 1/4 n (and thus does not hold when d is large.) Improved lower bounds [5,7] proved that the drop-off in κ ∆ is at most polynomial in ∆. More precisely, they prove that a random d-SAT formula with at most n (d+2)/4 clauses requires exponential size Resolution refutations. Thus for all values of d, even when the number of clauses is way above the threshold, Resolution refutations are exponentially long. They also give asymptotically matching upper bounds, showing that there are DLL refutations of size exp(n/∆ 1/(d−2) ).Superpolynomial lower bounds for random d-SAT formulas are also known for other weak proof systems such as the polynomial calculus and Res(k) [1,6], and random d-SAT is also...

show abstract

Towards a Complexity-Theoretic Understanding of Restarts in SAT Solvers

Fleming

Vinyals

et al. 2020

View full text Add to dashboard Cite

Predicting Protein Thermostability Upon Mutation Using Molecular Dynamics Timeseries Data

Fleming

Kinsella

Ing

2016

Preprint

View full text Add to dashboard Cite

A large number of human diseases result from disruptions to protein structure and function caused by missense mutations. Computational methods are frequently employed to assist in the prediction of protein stability upon mutation. These methods utilize a combination of protein sequence data, protein structure data, empirical energy functions, and physicochemical properties of amino acids. In this work, we present the first use of dynamic protein structural features in order to improve stability predictions upon mutation. This is achieved through the use of a set of timeseries extracted from microsecond timescale atomistic molecular dynamics simulations of proteins. Standard machine learning algorithms using mean, variance, and histograms of these timeseries were found to be 60-70% accurate in stability classification based on experimental G or protein-chaperone interaction measurements. A recurrent neural network with full treatment of timeseries data was found to be 80% accurate according the F1 score. The performance of our models was found to be equal or better than two recently developed machine learning methods for binary classification as well as two industry-standard stability prediction algorithms. In addition to classification, understanding the molecular basis of protein stability disruption due to disease-causing mutations is a significant challenge that impedes the development of drugs and therapies that may be used treat genetic diseases. The use of dynamic structural features allows for novel insight into the molecular basis of protein disruption by mutation in a diverse set of soluble proteins. To assist in the interpretation of machine learning results, we present a technique for determining the importance of features to a recurrent neural network using Garson's method. We propose a novel extension of neural interpretation diagrams by implementing Garson's method to scale each node in the neural interpretation diagram according to its relative importance to the network.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Noah Fleming

Semialgebraic Proofs and Efficient Algorithm Design

Semialgebraic Proofs and Efficient Algorithm Design

Random Θ(log n)-CNFs Are Hard for Cutting Planes

Towards a Complexity-Theoretic Understanding of Restarts in SAT Solvers

Predicting Protein Thermostability Upon Mutation Using Molecular Dynamics Timeseries Data

Contact Info

Product

Resources

About