# CNFgen - graph formats

# CNFgen

## Combinatorial benchmarks for SAT solvers

The structure of some formula recipes in `cnfgen`

is based on
graphs, which can be either **simple** undirected graphs, **biparite**
graphs, **directed** graphs or **dag** (directed acyclic graphs). It is
possible either to read graphs from files or to generate them
internally. The `cnfgen`

tool accepts and saves graphs in the
following file formats.

- GML format (specification)
- DOT format if
`NetworkX`

library supports it (specification) - DIMACS edge format (specification)
- KTH adjacency list format (specification)
- Matrix format (specification)

Different file formats are suitable for specific graph types, as shown in this table.

Graph type | Description | GML | DOT | KTH lists | DIMACS | Matrix |
---|---|---|---|---|---|---|

`simple` |
simple undirected graph | Yes |
Yes |
Yes |
Yes |
No |

`digraph` |
directed graph | Yes |
Yes |
Yes |
Yes |
No |

`dag` |
directed acyclic graph | Yes |
Yes |
Yes |
No |
No |

`bipartite` |
bipartite graphs | Yes |
Yes |
Yes |
No |
Yes |

If the file ends with one of the extensions `.kthlist`

, `.gml`

,
`.dot`

, `.matrix`

, `.dimacs`

then the program assumes that the file
is the corresponding format. It is possible to specify the format
using the `-gf`

option in the command line, which is particularly
useful when the graph is read from standard input or written on the
standard output, or when the file extension does not match its
file format.

#### GML and DOT format — bipartite graphs representation

### KTH adjacency list format

[Full specification by Jakob Nordström]

The `kthlist`

adjacency list graph format has been introduced by
Jakob Nordström of the Proof Complexity research group at KTH Royal
Institute of Technologyas as an efficient and simple way to
represent sparse graphs for applications in SAT solving and
proof complexity.

The format can specify all four types of graph, but the exact interpretation of the file depends a bit on the context.

#### KTH predecessors list format — `digraph`

The `kthlist`

format as a straightforward interpretation as
a directed graph, as described in the format specification.

#### KTH adjacency list format — `dag`

When CNFgen expects a directed acyclic graph, it expects the file
in `kthlist`

format to express very explicitly the topological
order of the vertices. Namely the sequence of \(1\),… ,\(n\) is
expected to be a topological order from sources to sinks of the
vertices, and formula constructions in CNFgen will respect
this order.

#### KTH predecessors list format — `simple`

If a vertex \(v\) has \(u\) in its adjacency list then the \(v\) must be
explicitly listed in the adjacency list of \(u\), otherwise the file
is considered not well defined. Nevertheless CNFgen is liberal and
does not expect that. For CNFgen an edge \(\{u,v\}\) considered to be
in graph when either \(u\) is in the list of \(v\), or when \(v\) is in
the list of \(u\), or both. Naturally all `simple`

graphs generated
by CNFgen are correct `kthlist`

files according to the strictest
interpretation of the format.

Here’s an example of how a graph with missing inverted edges is interpreted as an undirected graph. The file

c c This is a DAG of 5 vertices c 5 1 : 0 2 : 0 3 : 1 0 4 : 2 3 0 5 : 2 4 0

represents the directed acyclic graph

and the simple undirected graph

#### KTH predecessors list format — `bipartite`

The interpretation of a file as a bipartite graph is essentially the same as it is for undirected graphs, with the difference that the bipartition is made explicit by listing just the adjacency lists of the vertices on the left side.

c listing only left side vertices (bipartite graph) 11 1 : 7 8 9 0 2 : 6 7 9 0 3 : 8 9 11 0 4 : 8 10 11 0 5 : 6 10 11 0

The difference with the undirected graph is specification is
important. Notice that according to the format spec the previous
example is not a correct `kthlist`

file for an undirected graph
because it lacks the adjacency lists of the right side vertices
too. The following is the same graph, but encoded as an
undirected graph.

c listing left and right side vertices (undirected graph) 11 1 : 7 8 9 0 2 : 6 7 9 0 3 : 8 9 11 0 4 : 8 10 11 0 5 : 6 10 11 0 6 : 2 5 0 7 : 1 2 0 8 : 1 3 4 0 9 : 1 2 3 0 10 : 4 5 0 11 : 3 4 5 0

#### Using `simple`

files where `bipartite`

files are expected (deprecated)

CNFgen parser is forgiving in the sense that when an edge in an
undirected graph is specified only in one of the two adjacency
lists, CNFgen still accept the input (which formally is not a legal
`kthlist`

files). This means that any `kthlist`

file for bipartite
graphs could be interpreted as a file for undirected graphs as well.

It makes sense to allow the same in the opposite direction, then.
Unfortunately this requires to fix some ambiguity. We stress that
using `kthfile`

undirected graphs when a `bipartite`

graph is
expected could lead to some surprises and therefore it is deprecated.

The precise rule of how a `simple`

graph can be interpreted as
a `bipartite`

graph is as follows. Vertices are colored with either
left (`L`

) or right (`R`

) colors during the parsing of the file.
At the beginning the coloring is completely unspecified, and colors
are assigned to some vertices every time the parser process the
adjacency list of a vertex. When the parser finds an adjacency list

i : <j1> <j2> <j3> ... <jk> 0

it attempts to color as `L : R R R ... R`

. If this is
compatible with the coloring computed so far, it moves to the next
list. Otherwise it attempts the coloring `R : L L L ... L`

.
If neither of this attempts work, then the parser emits an error.

Essentially an error occurs when a vertex \(v\) has been already assigned by to one color because of previous lines, and a new line forces \(v\) to the other color.

If a vertex has no color by the end of the parsing (i.e. the vertex
did not occur in any adjacency list and its own adjacency list is
not specified) then it is assigned the color `R`

.

**Remarks** this bi-coloring process is arbitrary and greedy, hence
it is possible that the parsing fails even if the graph was
actually bipartite but the bipartition was not discovered.
Furthermore the same graph may get different bipartition depending
on how it is represented as a `kthlist`

file. Notice that:

- isolated vertices may be colored either
`L`

or`R`

, depending on whether the file specifies it adjacency list or not, respectively. - the order of the vertices is important. The first time a vertex
occur the parser attempts first to color it
`L`

and then`R`

.

### Matrix format

The matrix format is suitable to represent the adjacency matrix of a bipartite graph, where rows correspond to vertices on the left side and columns to vertices on the right side.

The file consists on two numbers `r`

and `c`

separated by
whitespace, followed by a whitespace separated sequence of zeros and
ones of length \(r\times c\). Here's a well formatted example.

5 6 0 1 1 1 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 1 0 1 1 1 0 0 0 1 1

which represents the bipartite graph