CNFgen - formula transformations
CNFgen
Combinatorial benchmarks for SAT solvers
Often we want to increase the hardness of formulas in a controlled way to study their proof complexity or how SAT solvers perform on them.
Usage
CNFgen include formula transformation capability: cnfgen
itself
can apply one or more such transformations to the formula it
generates,
cnfgen <family> <params> -T <tr1> <args1> -T <tr2> <args2> ...
Furthermore it is possible to apply a transformation to a DIMACS
file input.cnf
, using the command line
cnfgen dimacs input.cnf -T <tr1> <args1> -T <tr2> <args2> ...
For example, if we want to substitute each variables of a CNF in
DIMACS file input.cnf
with a XOR of 3 new variables and output the
formula on standard output it is sufficient to type
cnfgen dimacs input.cnf -T xor 3
For a list of all implemented transformations you can type either
cnfgen --help
To get the help for a specific transformation type
cnfgen <formula_type> <formula_params> -T <transformation_name> -h
For example
cnfgen peb pyramid 3 -T shuffle -h
where the formula generated is ignored, and the help for shuffle
transformation is shown.
Shuffling
Shuffle the CNF, by randomly permuting the order of clauses, of variables and randomly flipping the variable polarity.
cnfgen <formula_type> <formula_params> -T shuffle
p cnf 10 5 2 -5 -8 0 1 3 10 0 5 8 10 0 -1 8 -9 0 3 -6 7 0
can be shuffled into the following (here we did not shuffle the clauses to make the isomorphism between the two formulas easier to spot).
p cnf 10 5 -5 3 -9 0 -2 1 -7 0 -3 9 -7 0 2 9 -8 0 1 -4 -6 0
One dimensional lifting
For each variable \(x\) we introduce \(t\) variables \((x_1,x_2,...,x_d)\) and \(d\) selector variables \((y_1,y_2,...,x_d)\). The value of the original variable is substituted with the value of the \(x_i\) variables for which the selector \(y_i\) is active. We want at least one selector to be active.
Concretely the variable \(x\) is substituted by \(\bigwedge_i (y_i \rightarrow x_i)\). Furthermore the clause \(y_1 \vee y_2 \vee \ldots \vee y_d\) is added to the formula to guarantee that at least one selector is active.
cnfgen <formula_type> <formula_param> -T lift d
Substitution
Pick a formula \(F\). We can take a function \(g:\{0,1\}^d \rightarrow \{0,1\}\) and substitute each variable \(x\) in \(F\) with the value of function \(g\) on \(d\) new variables \(x_1, \ldots, x_d\). For example if \(g\) is XOR and \(d\) is 2 then the CNF \[ x \land (y \lor \overline{z}) \] becomes
\[ x_1 \oplus x_2 \land (y_1 \oplus y_2 \lor \overline{z_1 \oplus z_2}) \;. \]
Each of the two original clauses must be represented in CNF form: \(x_1 \oplus x_2\) becomes \((x_1 \lor x_2)\land(\overline{x_1} \lor \overline{x_2})\); and \(y_1 \oplus y_2 \lor \overline{z_1 \oplus z_2}\) becomes
\begin{align*} \; (\; y_1 \lor y_2 \lor z_1 \lor \overline{z_2} \;)\\ \land\; (\; \overline{y_1} \lor \overline{y_2} \lor z_1 \lor \overline{z_2} \;)\\ \land\; (\; y_1 \lor y_2 \lor \overline{z_1} \lor z_2 \;)\\ \land\; (\; \overline{y_1} \lor \overline{y_2} \lor \overline{z_1} \lor z_2 \;) \end{align*}Some of the substitutions can be a little bit more complex than XOR, but the XOR is a good example of how to make a formula harder.
Exclusive OR
Here \(g(x_1,x_2,\ldots,x_d)\) is \(x_1 \oplus x_2 \oplus \ldots \oplus x_d\).
cnfgen <formula_type> <formula_param> -T xor d
OR
Here \(g(x_1,x_2,\ldots,x_d)\) is \(x_1 \lor x_2 \lor \ldots \lor x_d\).
cnfgen <formula_type> <formula_param> -T or d
All equals
Here \(g(x_1,x_2,\ldots,x_d)\) is true when the values of variables \(x_1,x_2,\ldots,x_d\) are all the same.
cnfgen <formula_type> <formula_param> -T eq d
Not all equals
Here \(g(x_1,x_2,\ldots,x_d)\) is true when there is at least a true variables and at least a false variable among \(x_1,x_2,\ldots,x_d\).
cnfgen <formula_type> <formula_param> -T neq d
Exactly one
Here \(g(x_1,x_2,\ldots,x_d)\) is true if and only if exactly one variable among \(x_1,x_2,\ldots,x_d\) is true.
cnfgen <formula_type> <formula_param> -T one d
Exactly \(k\)
Here \(g(x_1,x_2,\ldots,x_d)\) is true if and only if exactly \(k\) variables among \(x_1,x_2,\ldots,x_d\) are true.
cnfgen <formula_type> <formula_param> -T exact d k
At Most \(k\)
Here \(g(x_1,x_2,\ldots,x_d)\) is true when at most \(k\) variables among \(x_1,x_2,\ldots,x_d\) are true.
cnfgen <formula_type> <formula_param> -T atmost d k
At Least \(k\)
Here \(g(x_1,x_2,\ldots,x_d)\) is true when at least \(k\) variables among \(x_1,x_2,\ldots,x_d\) are true.
cnfgen <formula_type> <formula_param> -T atleast d k
Any value but \(k\)
Here \(g(x_1,x_2,\ldots,x_d)\) is true when the number of true variables among \(x_1,x_2,\ldots,x_d\) is different from \(k\)
cnfgen <formula_type> <formula_param> -T anybut d k
If Then Else
Here \(d=3\) and \(g(x_1,x_2,x_3)\) is equal to \(x_2\) when \(x_1\) is true, and is equal to \(x_3\) otherwise.
cnfgen <formula_type> <formula_param> -T ite
Loose Majority
Here \(g(x_1,x_2,\ldots,x_d)\) is true when at least \(\left\lceil \frac{d}{2} \right\rceil\) of the variables are true.
cnfgen <formula_type> <formula_param> -T maj d
Variable compression
An interesting manipulation of a CNF is variable compression. Consider a formula \(F\) over \(N\) variables.
Similar to variable substitution, each variables \(x\) of the original formula \(F\) is substituted by the boolean function \(g:\{0,1\}^d \rightarrow \{0,1\}\) applied to \(d\) variables. But instead of using new variables \(x_1, \ldots, x_d\) in \(g\), we use \(d\) variables at random picked from a common pool of \(M\) variables (usually with \(M \ll N\)).
In this way the total number of variables decreases, but locally the complexity of any small group of clauses increases as in the case of substitution.
In cnfgen
we have implemented two options for the function \(g\),
which are XOR and Majority. To use variable compression the command
lines are
cnfgen <formula_type> <formula_params> -T xorcomp M d cnfgen <formula_type> <formula_params> -T majcomp M d
so that each variable is substituted by, respectively, the XOR and Majority of \(d\) variables picked at random among \(M\) new variables.