Benchmarks

Note: The benchmark archives that were included on this page are lost and not available anymore. The following information is kept mainly for archival purposes.

These benchmarks and tools are subject to change until their respective freeze dates, as documented in the SMT-COMP rules.

Application Track Benchmarks

2012 Additions

safari - derived from SMT-queries of an infinite-state model-checking tool (from R. Bruttomesso, SAFARI tool, CAV 2012).
smtic3_QF_LRA - benchmarks derived from an IC3 algorithm applied to some software verification benchmarks (from A. Griggio).

Benchmarks from 2011

blast_simplify_calls - QF_UFLIA benchmarks, from Blast.
These are just logs of the calls to the Simplify theorem prover, used by Blast when trying to model check some C programs (the name of the program is reflected in the name of the benchmark – original Simplify traces can be made available for those who are interested).
hybrid_networks - QF_LRA bounded model-checking (BMC) and k-Induction problems on networks of hybrid automata, from NuSMV.
These are benchmarks for hybrid automata, generated by unrolling the NuSMV models and checking them with BMC or k-Induction
kratos_systemC_swmc - QF_BV bounded model-checking (BMC) and k-Induction problems on SystemC designs, from NuSMV.
Unrollings of translation of some SystemC programs into NuSMV. The programs were those used, e.g., in the FMCAD’10 paper: Verifying SystemC: a Software Model Checking Approach by Alessandro Cimatti, Andrea Micheli, Iman Narasamdya and Marco Roveri The two sets are essentially the same, except for the different logic used.
kratos_systemC_swmc - QF_LIA bounded model-checking (BMC) and k-Induction problems on SystemC designs, from NuSMV.
Unrollings of translation of some SystemC programs into NuSMV. The programs were those used, e.g., in the FMCAD’10 paper: Verifying SystemC: a Software Model Checking Approach by Alessandro Cimatti, Andrea Micheli, Iman Narasamdya and Marco Roveri The two sets are essentially the same, except for the different logic used.
lustre - QF_LIA BMC and k-Induction problems on Lustre programs, from NuSMV.
These were obtained from subset of the Lustre models also used for the KIND set, except that they were generated from a NuSMV version of the Lustre programs, by NuSMV itself.
kind - QF_UFLIA traces from KIND, postprocessed (for competition.
These benchmarks were obtained from the KIND tool during Lustre programs verification.
asasp_20110606 - ASASP benchmarks.
ASASP implements a symbolic reachability procedure for the analysis of administrative access control policies. A more detailed description of the benchmarks can be found in the following paper: Efficient Symbolic Automated Analysis of Administrative Attribute-based RBAC-Policies, by F. Alberti, A. Armando, and S. Ranise.

Unsat Core Track Benchmarks

The unsat core benchmarks are the subset of the benchmarks from the indicated logics that are unsatisfiable; the benchmarks themselves have been modified to include names for the assertions. Per the SMTLIB standard, the benchmarks are allowed to contain a mix of named and unmaned formulae, though ordinarily, all top-level formulae will have names. The names may be scrambled by the benchmark scrambler.

The competition benchmarks were selected from these divisions:

QF_LIA (2584 benchmarks)
QF_LRA (317 benchmarks)
QF_IDL (683 benchmarks)
QF_BV (1399 benchmarks)

Format for Results Files

Note: The Results archives that were provided with every benchmarks archive are lost and not available anymore. The following information is obsolete and only kept for archival purposes.

Each tarball contains one or two status files for each benchmark:

BENCHMARK.results.txt is always present, and
BENCHMARK.results_untrusted.txt might be present as well

These files contain one status per line, corresponding to the series of (check-sat) commands in the benchmarks.

Each status line (where different from "unknown") has been determined by running at least 3 different SMT solvers on the set of instances resulted from "unfolding" each incremental benchmark using the scrambler. For *.results.txt, all the results different from "unknown" have been reported by at least 2 different solvers, whereas all the status lines generated by a single solver only (because e.g. the others timed out) have been replaced with "unknown". For *.results_untrusted.txt, the single-solver answer is also included. However, they are marked with an "# untrusted" comment.