Write a Blog >>
Mon 19 Jun 2017 17:25 - 17:50 at Actes, Civil Engineering - Dynamic Analysis and Testing Chair(s): Michael Pradel

A program can be viewed as a syntactic structure P (syntactic skeleton) parameterized by a collection of the identifiers V (variable names). This paper introduces the skeletal program enumeration (SPE) problem: Given a fixed syntactic skeleton P and a set of variables V , enumerate a set of programs P exhibiting all possible variable usage patterns within P. It proposes an effective realization of SPE for systematic, rigorous compiler testing by leveraging three important observations: (1) Programs with different variable usage patterns exhibit diverse control- and data-dependence information, and help exploit different compiler optimizations; (2) most real compiler bugs were revealed by small tests (i.e., small-sized P) — this “small-scope” observation opens up SPE for practical compiler validation; and (3) SPE is exhaustive w.r.t. a given syntactic skeleton and variable set, offering a level of guarantee absent from all existing compiler testing techniques.

The key challenge of SPE is how to eliminate the enormous amount of equivalent programs w.r.t. α-conversion. Our main technical contribution is a novel algorithm for computing the canonical (and smallest) set of all non-α-equivalent programs. To demonstrate its practical utility, we have applied the SPE technique to test C/C++ and Scala compilers using syntactic skeletons derived from their own regression test-suites. Our evaluation results are extremely encouraging. In less than six months, our approach has led to 217 confirmed GCC/Clang bug reports, 119 of which have already been fixed, and the majority are long latent bugs despite the extensive prior testing efforts (e.g., Csmith and EMI). The results also show that our algorithm for enumerating non-α-equivalent programs provides six orders of magnitude reduction, enabling processing the GCC test-suite in under a month versus 40K+ years with a naïve enumeration. Moreover, in about three weeks, our technique has also found 29 CompCert crashing bugs and 42 bugs in two Scala optimizing compilers. These results demonstrate our SPE technique’s generality and further illustrate its effectiveness.