Skip to content

feat: basic graph definitions#503

Open
BasilRohner wants to merge 12 commits intoleanprover:mainfrom
BasilRohner:main
Open

feat: basic graph definitions#503
BasilRohner wants to merge 12 commits intoleanprover:mainfrom
BasilRohner:main

Conversation

@BasilRohner
Copy link
Copy Markdown

@BasilRohner BasilRohner commented Apr 19, 2026

This PR introduces basic graph definitions: Graph, SimpleGraph, DiGraph, and SimpleDiGraph, along with coercions and notation that form the basis of algorithmic graph theory in CSLib.

Design. We intentionally diverge from Mathlib's graph definitions, prioritizing representations that support algorithmic reasoning. In graph algorithm design, it is common to manipulate graphs in many ways (e.g., dynamic graph operations such as adding or removing nodes/edges, contracting edges, and contracting vertex sets). As a result, we prioritize a minimal, simple definition of graphs that best aligns with how they are defined in the literature on algorithmic graph theory. In particular, we use set-based definitions of both the vertex and edge sets, along with minimal additional attributes, to reduce early proof obligations and select those whose proofs are closer to their textbook counterparts.

General structures. Here we use Set everywhere, allowing statements about possibly infinite graphs uniformly without enforcing finiteness through extra axioms. We associate labels with edges for the more general multigraph notions (Graph, DiGraph) via the auxiliary Edge and DiEdge records, and abstract the label away for the more restricted notions (SimpleGraph, SimpleDiGraph), which use Sym2 α and α × α directly. The definitions have been tested in practice by further formalizing concepts such as walks (see the GraphAlgorithms Walk module). However, to keep this PR concise, we defer discussion of the Walk design to a later PR.

Comparison with PR #427. PR #427 introduced SimpleGraph with Finset-based vertex and edge sets and edges via Sym2. Here, we use Set throughout, treating finiteness as an orthogonal concern rather than baking it into the type, and introduce a hierarchy comprising both general multigraph structures (Graph, DiGraph) with edge labels and specialized structures (SimpleGraph, SimpleDiGraph) without labels, since multi-edges are disallowed there.

Main definitions.

  • Graph: (possibly infinite) undirected multigraph
  • SimpleGraph: (possibly infinite) undirected graph without loops or multi-edges
  • DiGraph: (possibly infinite) directed graph
  • SimpleDiGraph: (possibly infinite) directed graph without loops or multi-edges

Future work. We plan to add fundamental graph algorithms building on these definitions.

Co-authored-by: Sorrachai Yingchareonthawornchai sorrachai.cp@gmail.com

@ctchou
Copy link
Copy Markdown
Collaborator

ctchou commented Apr 19, 2026

I have some questions/comments:
(1) Does the Graph in this PR differ in a substantive way from the Graph in Mathlib.Combinatorics.Graph.Basic? It seems to me that they are basically equivalent to each other. If my understanding is correct, then can't you just define Graph.endpoints and Graph.incidence on mathlib's Graph, if they are the API you prefer?
(2) A Set that is Set.Finite can be converted into a Finset using:
https://leanprover-community.github.io/mathlib4_docs/Mathlib/Data/Set/Finite/Basic.html#Set.Finite.toFinset
I know that this API is "noncomputable". But once you obtain the Finset versions of vertexSet and edgeSet, you can operate on them in a completely "computable" manner. By having both Set and Finset versions of basically the same definition, I'm afraid that you will have to repeat the statements and proofs of many theorems for the two versions.
(3) It seems to me that you can define DiGraph by extending Quivers in Mathlib.Combinatorics.Quiver.Basic, which is so general that I'm pretty sure it can accommodate any notion of "digraph" you have.

@Shreyas4991
Copy link
Copy Markdown
Contributor

Discussion thread on this PR :
Leanprover Zulip thread link

Comment thread Cslib/Algorithms/Lean/Graph/Graph.lean Outdated
/-- The endpoint map, sending each edge to its unordered pair of endpoints. -/
endpoints : ε → Sym2 α
/-- Every endpoint of an edge is a vertex. -/
incidence : ∀ e ∈ edgeSet, ∀ v ∈ endpoints e, v ∈ vertexSet
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may cause a junk value problem on endpoints where an endpoint outside the set must be specified.

To avoid the junk value problem, we can define an edge object separately and define the graph directly over the edgeSet. Note that we use endpoints_id as the identifier for the multiplicity of an edge of the same endpoints

structure UndirectedEdge (α β : Type*) where
  endpoints_id : β
  endpoints : Sym2 α
deriving DecidableEq

abbrev Edge := UndirectedEdge

structure UndirectedGraph (α β : Type*) where
  vertexSet : Set α
  edgeSet : Set (Edge α β)
  incidence : ∀ e ∈ edgeSet, ∀ v ∈ e.endpoints, v ∈ vertexSet

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out and for the discussion in the Zulip thread. I've updated the definitions in line with your proposed solution to avoid this issue.

Comment on lines +114 to +122
/-- Typeclass for graph-like structures that have a vertex set. -/
class HasVertexSet (G : Type*) (V : outParam Type*) where
/-- The vertex set of the graph. -/
vertexSet : G → V

/-- Typeclass for graph-like structures that have an edge set. -/
class HasEdgeSet (G : Type*) (E : outParam Type*) where
/-- The edge set of the graph. -/
edgeSet : G → E
Copy link
Copy Markdown
Collaborator

@eric-wieser eric-wieser Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make these the canonical spelling then you need to restate all the loopless and incidence theorems in terms of them (and probably rename those fields to be loopless' to avoid stealing the preferred name)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I've restated the loopless and incidence theorems in terms of the new typeclasses and renamed the original fields to loopless' to free up the preferred name.

@eric-wieser
Copy link
Copy Markdown
Collaborator

Design. We intentionally diverge from Mathlib's graph definitions, ...

Please record these design decisions in the module docstring for future readers, not just the git history / PR title.

Comment thread Cslib/Algorithms/Lean/Graph/Graph.lean Outdated
@[grind]
structure SimpleGraph (α : Type*) where
/-- The finite set of vertices. -/
vertexSet : Finset α
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency on the set definitions: SimpleGraph is defined using Finset, whereas Graph is defined using Set.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out. This design decision was intentional, with the goal of arriving at a definition of SimpleGraph that serves as a canonical graph definition encompassing the most common usage. That said, I agree this introduces an inconsistency, and I've updated it to Set α. Finiteness can be supplied as a separate hypothesis.

Comment thread Cslib/Algorithms/Lean/Graph/Graph.lean Outdated
pairs of distinct vertices. -/
structure SimpleDiGraph (α : Type*) where
/-- The finite set of vertices. -/
vertexSet : Finset α
Copy link
Copy Markdown
Collaborator

@sorrachai sorrachai May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finset vs. Set

Comment thread Cslib/Algorithms/Lean/Graph/Graph.lean Outdated
/-- The set of vertices. -/
vertexSet : Set α
/-- The set of edges. -/
edgeSet : Set ε
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoidable junk value problem using the suggested definitions of Arc (similarly to Edge).


/-- A finite simple graph on `α`: finite vertex and edge sets, edges as unordered pairs of
distinct vertices. -/
edgeSet : Set (Edge α β)
Copy link
Copy Markdown
Collaborator

@eric-wieser eric-wieser May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option here would be Sym2 a -> Set b, which gives the set of edge labels for a given pair of vertices, and the empty set for disconnected nodes

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! We'd prefer to keep the set-based definition since it's closer to the informal G = (V, E) presentation, whereas your version is closer to the relational definition of Mathlib.Combinatorics.Graph.

Copy link
Copy Markdown
Collaborator

@sorrachai sorrachai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with the current definitions now. After a long discussion in the Zulip chat, the design is not bad, but it is similar to Mathlib's definitions of simple undirected graphs, whose development of set-based definitions is still in its early stages. We are happy to collaborate as the library grows.

Aside from the fragmentation of definitions in mathlib, one key principle of cslib is efficient computation, and we may redesign the definitions in the future to support efficient computation and reasoning about complexity. We prefer to have all graph definitions (undirected/directed and simple/non-simple) in one place, and to build other objects on top of the proposed graph definitions with efficient computation in mind.

P.S. This PR was a product of the Lean hackathon event at ETH Zurich. More basic primitives for graph algorithms will follow in subsequent PRs.

Copy link
Copy Markdown
Collaborator

@chenson2018 chenson2018 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that @sorrachai has approved this, but I really do not feel that the Zulip thread has converged on accepting this PR, unless I am missing some other discussion in the graph theory channel.

This is decidedly not my area of expertise, but I am very worried about such a large deviation from very active development in Mathlib. I'd rather take this slowly and have a definite consensus or lack thereof rather than merge something too quickly.

@sorrachai
Copy link
Copy Markdown
Collaborator

I see that @sorrachai has approved this, but I really do not feel that the Zulip thread has converged on accepting this PR, unless I am missing some other discussion in the graph theory channel.

This is decidedly not my area of expertise, but I am very worried about such a large deviation from very active development in Mathlib. I'd rather take this slowly and have a definite consensus or lack thereof rather than merge something too quickly.

I appreciate your concern. I would like to understand what happens if there is no consensus on this.

@Shreyas4991
Copy link
Copy Markdown
Contributor

I disagree that these definitions are computationally efficient, far from it. An efficient defintion would use lean arrays to represent neighbouring sets, better yet Vectors. This definition is merely a representational change which can be obtained by constructors on mathlib definitions.

@BasilRohner
Copy link
Copy Markdown
Author

BasilRohner commented May 6, 2026

Summary. This pulls together the points we've made in the Zulip thread and earlier reviews. The design has two main ideas:

  1. Embedded-set structure. Vertex and edge sets are represented as vertexSet : Set α and edgeSet : Set β, and not using a type. A prime example of what this buys: subgraphs and graph operations stay inside the same type. A subgraph of G : Graph α β is itself a Graph α β, related by . Induced subgraph, edge and vertex deletion, and contraction are all maps Graph α β → Graph α β. No separate Subgraph G type, no coe/spanningCoe pair to choose between, no parallel lattice and API to keep in sync. The Mathlib workaround (SimpleGraph.Subgraph with its two coercions to SimpleGraph ↥verts and SimpleGraph V) has been a known pain point for years.

  2. Consistency across the family. All four graph types (Graph, SimpleGraph, DiGraph, SimpleDiGraph) use the same setup: embedded set-based vertex and edge sets, shared HasVertexSet/HasEdgeSet typeclasses, shared V(G)/E(G) notation, and forgetful maps (SimpleGraph.toGraph, SimpleDiGraph.toDiGraph) that carry properties from the general types down to the specialised ones. Walk machinery is a prime example: VertexSeq, IsWalk, loopErase, takeUntil, dropUntil, rerootCycle get defined once and reused across all four. Mathlib's SimpleGraph.Walk is indexed by a specific SimpleGraph V and doesn't transfer to Graph α β. Mathlib's graph ecosystem doesn't have this uniformity yet: Mathlib.Combinatorics.Graph.Basic (May 2025) uses embedded sets for multigraphs, while the older SimpleGraph V stays type-as-vertex-set, with no canonical map between them.

On compatibility with Mathlib. From what I understand, the concern raised so far is mostly not the design itself but the divergence from Mathlib's current API. We'd like eventual alignment too. Mathlib is already moving this way (Graph α β and a recent PR to add vertexSet to Digraph made by @Shreyas4991 are already steps in that direction), but extending the same approach to SimpleGraph V, for instance by adding a vertexSet, field may require serious refactoring. A lot of API rests on the type-as-vertex-set design, and every change has to be weighed against it. The new Graph α β is also still being filled in and refined for the same reason.

Introducing the definitions prevents stalling and enables active development of graph theory on the CSLib side without waiting for Mathlib's API to adapt. We hope Mathlib will push its graph definitions in this direction over time, though that will mean significant adaptation of existing APIs given the accumulated technical debt. If it does, alignment should be straightforward. If it doesn't, we still think the embedded-set design is much better suited to algorithmic graph theory. Automata in CSLib are a close precedent: developed in parallel with Mathlib's, the two have coexisted productively, each handling the use cases best suited to its scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants