feat: basic graph definitions#503
Conversation
|
I have some questions/comments: |
|
Discussion thread on this PR : |
| /-- The endpoint map, sending each edge to its unordered pair of endpoints. -/ | ||
| endpoints : ε → Sym2 α | ||
| /-- Every endpoint of an edge is a vertex. -/ | ||
| incidence : ∀ e ∈ edgeSet, ∀ v ∈ endpoints e, v ∈ vertexSet |
There was a problem hiding this comment.
This may cause a junk value problem on endpoints where an endpoint outside the set must be specified.
To avoid the junk value problem, we can define an edge object separately and define the graph directly over the edgeSet. Note that we use endpoints_id as the identifier for the multiplicity of an edge of the same endpoints
structure UndirectedEdge (α β : Type*) where
endpoints_id : β
endpoints : Sym2 α
deriving DecidableEq
abbrev Edge := UndirectedEdge
structure UndirectedGraph (α β : Type*) where
vertexSet : Set α
edgeSet : Set (Edge α β)
incidence : ∀ e ∈ edgeSet, ∀ v ∈ e.endpoints, v ∈ vertexSetThere was a problem hiding this comment.
Thank you for pointing this out and for the discussion in the Zulip thread. I've updated the definitions in line with your proposed solution to avoid this issue.
| /-- Typeclass for graph-like structures that have a vertex set. -/ | ||
| class HasVertexSet (G : Type*) (V : outParam Type*) where | ||
| /-- The vertex set of the graph. -/ | ||
| vertexSet : G → V | ||
|
|
||
| /-- Typeclass for graph-like structures that have an edge set. -/ | ||
| class HasEdgeSet (G : Type*) (E : outParam Type*) where | ||
| /-- The edge set of the graph. -/ | ||
| edgeSet : G → E |
There was a problem hiding this comment.
If you make these the canonical spelling then you need to restate all the loopless and incidence theorems in terms of them (and probably rename those fields to be loopless' to avoid stealing the preferred name)
There was a problem hiding this comment.
Thanks for pointing this out. I've restated the loopless and incidence theorems in terms of the new typeclasses and renamed the original fields to loopless' to free up the preferred name.
Please record these design decisions in the module docstring for future readers, not just the git history / PR title. |
| @[grind] | ||
| structure SimpleGraph (α : Type*) where | ||
| /-- The finite set of vertices. -/ | ||
| vertexSet : Finset α |
There was a problem hiding this comment.
Inconsistency on the set definitions: SimpleGraph is defined using Finset, whereas Graph is defined using Set.
There was a problem hiding this comment.
Thank you for pointing this out. This design decision was intentional, with the goal of arriving at a definition of SimpleGraph that serves as a canonical graph definition encompassing the most common usage. That said, I agree this introduces an inconsistency, and I've updated it to Set α. Finiteness can be supplied as a separate hypothesis.
| pairs of distinct vertices. -/ | ||
| structure SimpleDiGraph (α : Type*) where | ||
| /-- The finite set of vertices. -/ | ||
| vertexSet : Finset α |
| /-- The set of vertices. -/ | ||
| vertexSet : Set α | ||
| /-- The set of edges. -/ | ||
| edgeSet : Set ε |
There was a problem hiding this comment.
Avoidable junk value problem using the suggested definitions of Arc (similarly to Edge).
|
|
||
| /-- A finite simple graph on `α`: finite vertex and edge sets, edges as unordered pairs of | ||
| distinct vertices. -/ | ||
| edgeSet : Set (Edge α β) |
There was a problem hiding this comment.
Another option here would be Sym2 a -> Set b, which gives the set of edge labels for a given pair of vertices, and the empty set for disconnected nodes
There was a problem hiding this comment.
Thanks for the suggestion! We'd prefer to keep the set-based definition since it's closer to the informal G = (V, E) presentation, whereas your version is closer to the relational definition of Mathlib.Combinatorics.Graph.
There was a problem hiding this comment.
I am happy with the current definitions now. After a long discussion in the Zulip chat, the design is not bad, but it is similar to Mathlib's definitions of simple undirected graphs, whose development of set-based definitions is still in its early stages. We are happy to collaborate as the library grows.
Aside from the fragmentation of definitions in mathlib, one key principle of cslib is efficient computation, and we may redesign the definitions in the future to support efficient computation and reasoning about complexity. We prefer to have all graph definitions (undirected/directed and simple/non-simple) in one place, and to build other objects on top of the proposed graph definitions with efficient computation in mind.
P.S. This PR was a product of the Lean hackathon event at ETH Zurich. More basic primitives for graph algorithms will follow in subsequent PRs.
chenson2018
left a comment
There was a problem hiding this comment.
I see that @sorrachai has approved this, but I really do not feel that the Zulip thread has converged on accepting this PR, unless I am missing some other discussion in the graph theory channel.
This is decidedly not my area of expertise, but I am very worried about such a large deviation from very active development in Mathlib. I'd rather take this slowly and have a definite consensus or lack thereof rather than merge something too quickly.
I appreciate your concern. I would like to understand what happens if there is no consensus on this. |
|
I disagree that these definitions are computationally efficient, far from it. An efficient defintion would use lean arrays to represent neighbouring sets, better yet Vectors. This definition is merely a representational change which can be obtained by constructors on mathlib definitions. |
|
Summary. This pulls together the points we've made in the Zulip thread and earlier reviews. The design has two main ideas:
On compatibility with Mathlib. From what I understand, the concern raised so far is mostly not the design itself but the divergence from Mathlib's current API. We'd like eventual alignment too. Mathlib is already moving this way ( Introducing the definitions prevents stalling and enables active development of graph theory on the CSLib side without waiting for Mathlib's API to adapt. We hope Mathlib will push its graph definitions in this direction over time, though that will mean significant adaptation of existing APIs given the accumulated technical debt. If it does, alignment should be straightforward. If it doesn't, we still think the embedded-set design is much better suited to algorithmic graph theory. Automata in CSLib are a close precedent: developed in parallel with Mathlib's, the two have coexisted productively, each handling the use cases best suited to its scope. |
This PR introduces basic graph definitions:
Graph,SimpleGraph,DiGraph, andSimpleDiGraph, along with coercions and notation that form the basis of algorithmic graph theory in CSLib.Design. We intentionally diverge from Mathlib's graph definitions, prioritizing representations that support algorithmic reasoning. In graph algorithm design, it is common to manipulate graphs in many ways (e.g., dynamic graph operations such as adding or removing nodes/edges, contracting edges, and contracting vertex sets). As a result, we prioritize a minimal, simple definition of graphs that best aligns with how they are defined in the literature on algorithmic graph theory. In particular, we use set-based definitions of both the vertex and edge sets, along with minimal additional attributes, to reduce early proof obligations and select those whose proofs are closer to their textbook counterparts.
General structures. Here we use
Seteverywhere, allowing statements about possibly infinite graphs uniformly without enforcing finiteness through extra axioms. We associate labels with edges for the more general multigraph notions (Graph,DiGraph) via the auxiliaryEdgeandDiEdgerecords, and abstract the label away for the more restricted notions (SimpleGraph,SimpleDiGraph), which useSym2 αandα × αdirectly. The definitions have been tested in practice by further formalizing concepts such as walks (see the GraphAlgorithms Walk module). However, to keep this PR concise, we defer discussion of the Walk design to a later PR.Comparison with PR #427. PR #427 introduced
SimpleGraphwithFinset-based vertex and edge sets and edges viaSym2. Here, we useSetthroughout, treating finiteness as an orthogonal concern rather than baking it into the type, and introduce a hierarchy comprising both general multigraph structures (Graph,DiGraph) with edge labels and specialized structures (SimpleGraph,SimpleDiGraph) without labels, since multi-edges are disallowed there.Main definitions.
Graph: (possibly infinite) undirected multigraphSimpleGraph: (possibly infinite) undirected graph without loops or multi-edgesDiGraph: (possibly infinite) directed graphSimpleDiGraph: (possibly infinite) directed graph without loops or multi-edgesFuture work. We plan to add fundamental graph algorithms building on these definitions.
Co-authored-by: Sorrachai Yingchareonthawornchai sorrachai.cp@gmail.com