The International Satisfiability Modulo Theories (SMT) Competition.

**Home**

**Introduction**

**Benchmark Submission**

**Publications**

**SMT-LIB**

**Previous Editions**

In the new Proof Exhibition Track, teams can submit a proof-producing solver together with a proof checker for its proof format. We will compile and present the results, as well as assemble a panel of experts (not necessarily organizers) to do a qualitative assessment for each proof-producing solver, proof format, and proof checker.

Solver authors should provide two scripts with their solver. The first
script called `smtcomp_solver`

, should take the name of the
benchmark file as a single argument and run the solver on the benchmark.

As usual the benchmark is pre-processed, i.e., the identifiers are randomized
and the commands for producing proofs are added to the benchmark. The solver
needs to understand the option `:produce-proofs`

and the command `(get-proof)`

.
Solvers must respond to each command in the benchmark script with the answer
defined in the SMT-LIB format specification. There will be a `(check-sat)`

command for which `unsat`

should be output and a `(get-proof)`

command for
which the proof should be given as an s-expression according to the
SMT-LIB standard (the format is not specified otherwise).
Solvers that respond `unknown`

to the `(check-sat)`

command should respond
with an error to the following `(get-proof)`

command. Note that the error
result must be output to stdout.
In contrast to other tracks, anything written to stderr is ignored and the
output is saved for diagnostic purposes.

The second script, `smtcomp_checker`

, should take
the benchmark file name as first argument and the stdout output of the
solver script as second argument. Note that this output will
usually start with the `unsat`

result of the `(check-sat)`

command, which the checker must either parse or skip.

The checker should check the proof within the output and provide the result
to stdout. Output printed to stderr is ignored (and saved for diagnostics).
The first line of the proof checker result should be either
`valid`

, `invalid`

, `holey`

, `sat`

, or `unknown`

.
The answer `valid`

must only be given when the checker certifies that the
given proof justifies the unsatisfiability of the given benchmark.
A checker that returns `valid`

to a satisfiable benchmark is deemed to be
unsound.
The answer `invalid`

should be returned if the proof is errorneous or
otherwise malformed.
If the proof checker determines that the proof is valid but it contains
holes, i.e., steps that are not checked for correctness, the answer should
be `holey`

.
The answer `sat`

should be given if the solver claimed that the benchmark
is satisfiable and `unknown`

should be given if (1) the solver answered
`unknown`

, (2) it answered `unsat`

but did not give a proof, or (3) the
solver gave an error result.

After the first line, the proof checker can provide arbitrary `key=value`

pairs. These will be collected in the final result table and can include
for example information about the number of proof steps, the number of nodes
in the proof, the number of holes, the number of different proof rules, etc.

The result of a solver is considered erroneous if the proof checker
returns `sat`

or `invalid`

.
Since the proof exhibition track is non-competitive and we are not fixing
the definition of a valid proof, there will be no score computed for this
track.

A proof checker is unsound if it returns `valid`

for an (invalid) proof
of a satisfiable benchmark.
Thus, a proof checker must also check that the formula that was proved
corresponds to the benchmark given. Soundness of the proof checker is
a central criterion by which the participants will be judged.

This track will use a wall-clock time limit of 20 minutes per solver/benchmark pair. This time limit includes the time for proof checking. We will record the times for each stage.

Solvers in this track will be wrapped in the competition to include the script starexec_run_proof. This script will call the solver and checker script.

If you participate in the proof track, you should include the
script mentioned above in your solver binary.
You can submit a single solver binary for all tracks, by providing a
`starexec_run_default`

script for the other tracks. You also
need to create `smtcomp_solver`

and `smtcomp_checker`

scripts for the
proof exhibition track. You can also create more configurations
for the other tracks by adding a corresponding `starexec_run_`

*track*
script.

As mentioned above the proof checker can add arbitrary key-value pairs that will be collected in the results. To standardize the results, we suggest that proof checkers print the following key-value pairs:

- holes: number of unchecked axioms or rewrite rules (that can introduce unsoundness),
- axioms: number of checked axioms or rewrite rules,
- steps: number of proof steps (in some proof-specific granularity),
- assertions: number of asserted formulas used in the proof.