View on GitHub

# CNFgen

## Combinatorial benchmarks for SAT solvers

Often we want to increase the hardness of formulas in a controlled way to study their proof complexity or how SAT solvers perform on them.

### Usage

CNFgen include formula transformation capability: cnfgen itself can apply one or more such transformations to the formula it generates,

cnfgen <family> <params> -T <tr1> <args1> -T <tr2> <args2> ...


Furthermore it is possible to apply a transformation to a DIMACS file input.cnf, using the command line

cnfgen dimacs input.cnf -T <tr1> <args1> -T <tr2> <args2> ...


For example, if we want to substitute each variables of a CNF in DIMACS file input.cnf with a XOR of 3 new variables and output the formula on standard output it is sufficient to type

cnfgen dimacs input.cnf -T xor 3


For a list of all implemented transformations you can type either

cnfgen --help


To get the help for a specific transformation type

cnfgen <formula_type> <formula_params> -T <transformation_name> -h


For example

cnfgen peb pyramid 3  -T shuffle -h


where the formula generated is ignored, and the help for shuffle transformation is shown.

### Shuffling

Shuffle the CNF, by randomly permuting the order of clauses, of variables and randomly flipping the variable polarity.

cnfgen <formula_type> <formula_params> -T shuffle

p cnf 10 5
2 -5 -8 0
1 3 10 0
5 8 10 0
-1 8 -9 0
3 -6 7 0


can be shuffled into the following (here we did not shuffle the clauses to make the isomorphism between the two formulas easier to spot).

p cnf 10 5
-5 3 -9 0
-2 1 -7 0
-3 9 -7 0
2 9 -8 0
1 -4 -6 0


### One dimensional lifting

For each variable $$x$$ we introduce $$t$$ variables $$(x_1,x_2,...,x_d)$$ and $$d$$ selector variables $$(y_1,y_2,...,x_d)$$. The value of the original variable is substituted with the value of the $$x_i$$ variables for which the selector $$y_i$$ is active. We want at least one selector to be active.

Concretely the variable $$x$$ is substituted by $$\bigwedge_i (y_i \rightarrow x_i)$$. Furthermore the clause $$y_1 \vee y_2 \vee \ldots \vee y_d$$ is added to the formula to guarantee that at least one selector is active.

cnfgen <formula_type> <formula_param> -T lift d


### Substitution

Pick a formula $$F$$. We can take a function $$g:\{0,1\}^d \rightarrow \{0,1\}$$ and substitute each variable $$x$$ in $$F$$ with the value of function $$g$$ on $$d$$ new variables $$x_1, \ldots, x_d$$. For example if $$g$$ is XOR and $$d$$ is 2 then the CNF $x \land (y \lor \overline{z})$ becomes

$x_1 \oplus x_2 \land (y_1 \oplus y_2 \lor \overline{z_1 \oplus z_2}) \;.$

Each of the two original clauses must be represented in CNF form: $$x_1 \oplus x_2$$ becomes $$(x_1 \lor x_2)\land(\overline{x_1} \lor \overline{x_2})$$; and $$y_1 \oplus y_2 \lor \overline{z_1 \oplus z_2}$$ becomes

\begin{align*} \; (\; y_1 \lor y_2 \lor z_1 \lor \overline{z_2} \;)\\ \land\; (\; \overline{y_1} \lor \overline{y_2} \lor z_1 \lor \overline{z_2} \;)\\ \land\; (\; y_1 \lor y_2 \lor \overline{z_1} \lor z_2 \;)\\ \land\; (\; \overline{y_1} \lor \overline{y_2} \lor \overline{z_1} \lor z_2 \;) \end{align*}

Some of the substitutions can be a little bit more complex than XOR, but the XOR is a good example of how to make a formula harder.

##### Exclusive OR

Here $$g(x_1,x_2,\ldots,x_d)$$ is $$x_1 \oplus x_2 \oplus \ldots \oplus x_d$$.

cnfgen <formula_type> <formula_param> -T xor d

##### OR

Here $$g(x_1,x_2,\ldots,x_d)$$ is $$x_1 \lor x_2 \lor \ldots \lor x_d$$.

cnfgen <formula_type> <formula_param> -T or d

##### All equals

Here $$g(x_1,x_2,\ldots,x_d)$$ is true when the values of variables $$x_1,x_2,\ldots,x_d$$ are all the same.

cnfgen <formula_type> <formula_param> -T eq d

##### Not all equals

Here $$g(x_1,x_2,\ldots,x_d)$$ is true when there is at least a true variables and at least a false variable among $$x_1,x_2,\ldots,x_d$$.

cnfgen <formula_type> <formula_param> -T neq d

##### Exactly one

Here $$g(x_1,x_2,\ldots,x_d)$$ is true if and only if exactly one variable among $$x_1,x_2,\ldots,x_d$$ is true.

cnfgen <formula_type> <formula_param> -T one d

##### Exactly $$k$$

Here $$g(x_1,x_2,\ldots,x_d)$$ is true if and only if exactly $$k$$ variables among $$x_1,x_2,\ldots,x_d$$ are true.

cnfgen <formula_type> <formula_param> -T exact d k

##### At Most $$k$$

Here $$g(x_1,x_2,\ldots,x_d)$$ is true when at most $$k$$ variables among $$x_1,x_2,\ldots,x_d$$ are true.

cnfgen <formula_type> <formula_param> -T atmost d k

##### At Least $$k$$

Here $$g(x_1,x_2,\ldots,x_d)$$ is true when at least $$k$$ variables among $$x_1,x_2,\ldots,x_d$$ are true.

cnfgen <formula_type> <formula_param> -T atleast d k

##### Any value but $$k$$

Here $$g(x_1,x_2,\ldots,x_d)$$ is true when the number of true variables among $$x_1,x_2,\ldots,x_d$$ is different from $$k$$

cnfgen <formula_type> <formula_param> -T anybut d k

##### If Then Else

Here $$d=3$$ and $$g(x_1,x_2,x_3)$$ is equal to $$x_2$$ when $$x_1$$ is true, and is equal to $$x_3$$ otherwise.

cnfgen <formula_type> <formula_param> -T ite

##### Loose Majority

Here $$g(x_1,x_2,\ldots,x_d)$$ is true when at least $$\left\lceil \frac{d}{2} \right\rceil$$ of the variables are true.

cnfgen <formula_type> <formula_param> -T maj d


### Variable compression

An interesting manipulation of a CNF is variable compression. Consider a formula $$F$$ over $$N$$ variables.

Similar to variable substitution, each variables $$x$$ of the original formula $$F$$ is substituted by the boolean function $$g:\{0,1\}^d \rightarrow \{0,1\}$$ applied to $$d$$ variables. But instead of using new variables $$x_1, \ldots, x_d$$ in $$g$$, we use $$d$$ variables at random picked from a common pool of $$M$$ variables (usually with $$M \ll N$$).

In this way the total number of variables decreases, but locally the complexity of any small group of clauses increases as in the case of substitution.

In cnfgen we have implemented two options for the function $$g$$, which are XOR and Majority. To use variable compression the command lines are

cnfgen <formula_type> <formula_params> -T xorcomp M d
cnfgen <formula_type> <formula_params> -T majcomp M d


so that each variable is substituted by, respectively, the XOR and Majority of $$d$$ variables picked at random among $$M$$ new variables.