We propose a data-driven method for synthesizing static analyses to detect side-channel information leaks in cryptographic software. Compared to the conventional way of manually crafting such static analyzers, which can be tedious, error prone and suboptimal, our learning-based technique is not only automated but also provably sound. Our analyzer consists of a set of type-inference rules learned from the training data, i.e., example code snippets annotated with the ground truth. Internally, we use syntax-guided synthesis (SyGuS) to generate new recursive features and decision tree learning (DTL) to generate analysis rules based on these features. We guarantee soundness by proving each learned analysis rule via a technique called query containment checking. We have implemented our technique in the LLVM compiler and used it to detect power side channels in C programs that implement cryptographic protocols. Our results show that, in addition to being automated and provably sound during synthesis, our analyzer can achieve the same empirical accuracy as two state-of-the-art, manually-crafted analyzers while being 300X and 900X faster, respectively.• We propose the first data-driven method for learning a provably sound static analyzer using syntax guided synthesis (SyGuS) and decision tree learning (DTL). • We guarantee soundness by formulating and solving a Datalog query containment checking problem. • We demonstrate the effectiveness of our method for detecting side channels in cryptographic software. In the remainder of this paper, we begin by presenting the technical background in Section II and our motivating example in Section III. We then describe the learner in Section IV and the prover in Section V, followed by the experimental results in Section VI. Finally, we survey the related work in Section VII and conclude in Section VIII.
II. PRELIMINARIES
A. Power Side-ChannelsPrior works in side-channel security [19]- [21] show that variance in the power consumption of a computing device may leak secret information; for example, when a secret value is stored in a physical register, its number of logical-1 bits may affect the power consumption of the CPU. Such side-channel leaks are typically mitigated by masking, e.g., using d random bits (r 1 , . . . , r d ) to split a key bit into d + 1 secret shares: key 1 = r 1 , . . ., key d = r d , and key d+1 = r 1 ⊕r 2 . . .⊕r d ⊕key, where ⊕ denotes the logical operation exclusive or (XOR). Since all d + 1 shares are uniformly distributed in the {0, 1}, in theory, this order-d masking scheme is secure in that any combination of less than d shares cannot reveal the secret, but combining all d + 1 shares, key 1 ⊕ key 2 ⊕ ...key d+1 = key, recovers the secret.In practice, masking countermeasures must also be implemented properly to avoid de-randomizing any of the secret shares accidentally. ConsiderWhile syntactically dependent on the two randomized values t L and t R , t is in fact leaky because, semantically, it does not depend on the random input r 1 . In this work, we aim t...