View on GitHub

# CNFgen

## Combinatorial benchmarks for SAT solvers

Here we highlight some of the most convenient and simple ways to get benchmark formulas using cnfgen.

### Pigeonhole principle

Pigeonhole principle claims that it is possible to assign $$M$$ pigeons to $$N$$ holes injectively. It is unsatisfiable when $$M>N$$, and is generated by command line

cnfgen php M N


The following "sparse" variant of the pigeonhole principle has much shorter clauses.

cnfgen php M N d


Indeed, it generates a pigeonhole principle formula from $$m$$ pigeons to $$n$$ holes, too, but each pigeon is only allowed to fly into one among $$d$$ of the available holes.

You can add functional and onto clauses using the command line options.

cnfgen php [--functional] [--onto] M N
cnfgen php [--functional] [--onto] M N d


### Tseitin formulas

Basic UNSAT examples of Tseitin formulas can be generated via

cnfgen tseitin N
cnfgen tseitin N d


which produce Tseitin formulas over random $$d$$-regular graphs, with random odd charge on the vertices ($$d$$ defaults to 4 if unspecified).

For more control, Tseitin formula on a graph $$G$$ has the command line

cnfgen tseitin [first|random|randomodd|randomeven] $G$


where $$G$$ is a graph file or a graph specification, and where charge can be either first (only the first vertex is labeled 1) or a random charge with, respectively, uspecified, odd, even total charge.

For example the following command produces a tseitin formula over a 10 by 10 grid graph, with random labeling of the vertices.

cnfgen tseitin random grid 10 10


### Ordering principle

The ordering principle over $$N$$ elements is generated by

cnfgen op N


and the graph ordering principle over a $$d$$-regular random graph of $$N$$ vertices is generate by

cnfgen op N d


It is possible to consider total orders with

cnfgen op --total N
cnfgen op --total N d


For further control the graph ordering principle over an input graph $$G$$ is produced by

cnfgen op G


### Subset cardinality

Subset cardinality formulas encode the requirement that, given $$\{-1,1\}$$ values to the edges of a bipartite graph, the sum of the values incident to the left and right vertices must be, respectively, $$\geq 0$$ and $$\leq 0$$. To get UNSAT instances that are hard (at least for resolution) use

cnfgen subsetcard N


for increasing values of $$N$$.

### Parity principle

The formula claims that $$N$$ elements can be grouped in pairs. Clearly satisfiable if and only if $$N$$ is even.

cnfgen parity N


### Counting principle

This formula that claims that a set of $$M$$ elements can be partitioned into sets of size $$d$$. Of course the formula is satisfiable if and only if $$d$$ divides $$M$$.

cnfgen count M d


so that

cnfgen count 10 4


gives an unsatisfiable formula while

cnfgen count 9 3


gives a satisfiable one.

### Matching principle

The formula claims some graph $$G$$ has a matching. To generate the benchmark with $$G$$ being a random $$d$$-regular on $$N$$ vertices we can use

cnfgen matching gnd N d


The parity principle is equivalent to the matching formula over a complete graph.

cnfgen matching complete N


### K-coloring formula

The formula asserts that the input graph $$G$$ is colorable with $$k$$ colors.

cnfgen kcolor k G


For example a formula generated as

cnfgen kcolor 3 gnd 100 2 plantclique 4


is clearly unsatisfiable. Hard instances of $$3$$-coloring are random $$G(n,m)$$ graphs with $$n$$ vertices and $$m=2.522n$$ edges placed at random. Unfortunately $$n$$ must be large for the hardness to show up in practice.

cnfgen kcolor 3 gnm 1000 2530


### K-clique formula

The formula claims that there is no clique of size at least $$k$$ in the input graph $$G$$.

cnfgen kclique k G


It is possible to plant a clique in the graph with the option --plantclique. The formula generated by

cnfgen kclique 10 gnp 200 0.5 plantclique 15


claims that there is a clique of size 10 in a random Erdös-Renyi graph with 200 vertices where a clique of size 15 has been planted.

For some hard instances you can ask for $$k$$-cliques in a $$G(n,p)$$ random graph with $$p \ll n^{-2/(k-1)}$$.

### Pebbling formula

Pebbling formulas are UNSAT CNFs which represent propagation from the sources to the sink of a directed acyclic graph. Command line

cnfgen peb pyramid H
cnfgen peb tree H
cnfgen peb path H


build pebbling formulas on pyramid, trees or path. of height $$H$$.

Usually such formula are lifted via XOR substitution as in

cnfgen peb pyramid 10 -T xor 2


### Stones formulas

A more complex variant of the pebbling formula, apt to study hardness with respect to subsystems of resolution. The formula considers a set of $$s$$ stones and a directed acyclic graph. Each vertex of the graph obtains a stone and then a process of propagation from the sources to the sink (similar to the one in the pebbling formula) is considered on the vertices with a stone on it. To generate a stone formula for, say, 12 stones and a pyramid graph:

cnfgen stone 12 pyramid 5


### Graph isomorphism formula

Produce a CNF that claims two graphs $$G_1$$ and $$G_2$$ to be isomorphic. For example if the graphs are saved on gml files use the command.

cnfgen iso G1.gml -e G2.gml


If only one graph is given, the formula claims that the formula has a non trivial automorphism.

cnfgen iso G.gml


### Ramsey number formula

The command line

cnfgen ram s k N


produces a formula that claims that the Ramsey number $$r(s,k)$$ is larger than $$N$$.

### Ramsey counterexample

The formula claims that the graph $$G$$ given in input is not a counter example for the Ramsey number $$r(k,s)$$. Namely it claims that the graph has either a clique of size $$k$$ or an independent set of size $$s$$.

cnfgen ramlb k s G


### Clique-coloring formula

The formula asserts that there exists a graph $$G$$ of $$n$$ vertices that simultaneously

• contains a clique of $$k$$ vertices
• it has a coloring with $$c$$ colors.
cnfgen cliquecoloring n k c


Interesting hard UNSAT cases for resolution and cutting planes are when $$k \approx n^{2/3}$$ and $$c=k-1$$. For example

cnfgen cliquecoloring 64 16 15


is clearly unsatisfiable.

### Random K-CNF

Sample a random CNF with $$m$$ clauses of $$k$$ literals over $$n$$ variables. Clauses are sampled without replacement. To get hard UNSAT random $$3$$-CNFs the user should pick about $$c n$$ clauses where $$c>4.5$$. Unfortunately this hardness is asymptotic, therefore $$n$$ may need to be quite large.

cnfgen randkcnf 3 1000 4500