**Evan Patterson**

Stanford University, Statistics Department

Special Session on Applied Category

AMS Fall Western Sectional Meeting, November 9-10, 2019

Is there a correspondence between two graphs?

- Formalized in different ways:
- Graph homomorphism
- Graph isomorphism
- Maximum common subgraph
- Graph edit distance
- ...

- Exact formulations are mostly NP-hard
- Exact and inexact matching heavily studied by computer scientists

- Hardness due to combinatorics of matching
- Can we
*relax*the matching problem into an easier one? - A common method for relaxing matching problems is
*optimal transport* - Numerous efforts to match graphs using optimal transport

Aflalo et al 2015; Alvarez-Melis et al 2017; Vayer et al 2018

**Monge problem** (1781): Given measures $\mu \in \mathrm{Prob}(X)$ and $\nu \in \mathrm{Prob}(Y)$ and cost function $c: X \times Y \to \mathbb{R}$,

Image: Villani 2003

Monge problem is combinatorial and nonconvex.

**Kantorovich's relaxation** (1942): Replace deterministic map with probabilistic *coupling*:

where

$$ \mathrm{Coup}(X,Y) := \{\pi \in \mathrm{Prob}(X \times Y): \mathop{\mathrm{proj}_X} \pi = \mu,\, \mathop{\mathrm{proj}_Y} \pi = \nu \}. $$New problem is convex, in fact a *linear program*.

When cost is a metric $d: X \times X \to \mathbb{R}$, we get the **Wasserstein metric** on $\mathrm{Prob}(X)$:

Applied to graph matching in two ways:

Featurize vertices of both graphs in

*common*metric space, then compute Wasserstein distance.Convert graphs into

*distinct*metric spaces on vertices (via*shortest path metric*), then compute Gromov-Wasserstein distance.

**Problem**: Discards significant information about edges.

**Goal**: Construct a Wasserstein-style metric on graphs that respects both *vertices* and *edges*, in a sense to be defined.

So, this informal diagram should not commute!

**Recall**: For $\mathcal{C}$ a small category, a *$\mathcal{C}$-set* is a functor $X: \mathcal{C} \to \mathbf{Set}$.

The *category of $\mathcal{C}$-sets* is the functor category $[\mathcal{C},\mathbf{Set}]$.

**Example**: When $\mathcal{C} =$ , a $\mathcal{C}$-set is a (directed) *graph*.

**Example**: When $\mathcal{C} = $ , a $\mathcal{C}$-set is a *symmetric graph*.

Nearly the same as an *undirected graph*.

**Other examples**

- Reflexive and symmetric reflexive graphs
- Bipartite graphs
- Hypergraphs
- Higher-dimensional (semi-)simplicial sets

**For applications**: *Attributes* can be modeled in $\mathcal{C}$, to get vertex-attributed graphs, edge-attributed graphs, and so on.

A *$\mathcal{C}$-set in a category $\mathcal{S}$* is a functor $X: \mathcal{C} \to \mathcal{S}$.

For us, useful categories $\mathcal{S}$ include:

- $\mathbf{Set}$, the category of sets and functions
- $\mathbf{Meas}$, the category of measurable spaces and measurable functions
- $\mathbf{Meas}_*$, the category of measure spaces and measurable functions
- $\mathbf{Met}$, the category of metric spaces and functions
- $\mathbf{MM}$, the category of metric measure spaces (mm spaces) and measurable functions
- $\mathbf{Markov}$, the category of measurable spaces and Markov kernels

Explore *relaxations* of the notion of homomorphism (natural transformation):

In this talk, I give a sketch. A systematic development is in the paper.

A *Markov kernel* $M: X \to Y$ is a measurable assignment of each point $x \in X$ to a probability measure $M(x) \in \mathrm{Prob}(Y)$.

Other names for Markov kernels:

- Probability kernels
- Stochastic kernels
- Stochastic relations

There is a category **Markov** of measurable spaces and Markov kernels.

Let $\mu \in \mathrm{Prob}(X)$ and $\nu \in \mathrm{Prob}(Y)$.

For any coupling $\pi \in \mathrm{Coup}(\mu,\nu)$, the *disintegration* (conditional probability distribution) $M: X \to Y$ satisfies

Conversely, for any Markov kernel $M: X \to Y$ with $\mu \cdot M = \nu$, there is a *product*

In fact, this correspondence is functorial.

**Proposition** (folkore?): Under regularity conditions, there is an isomorphism between

- the category of probability spaces and couplings, with composition defined by the
*gluing lemma*, and - the category of probability spaces and
*measure-preserving*Markov kernels (defined up to almost-everywhere equality).

**Interpretation**: Markov kernels allow a "directed" version of optimal transport.

$\mathbf{Meas}$ embeds in $\mathbf{Markov}$ as the *deterministic* Markov kernels:

Induces a *relaxation functor* by post-composition:

**Definition**. A *Markov morphism* $X \to Y$ of measurable $\mathcal{C}$-spaces $X$ and $Y$ is a morphism $\mathcal{M}_*(X) \to \mathcal{M}_*(Y)$.

So, a Markov morphism $\Phi: X \to Y$ of graphs $X$ and $Y$ consists of Markov kernels $\Phi_V: X(V) \to Y(V)$ and $\Phi_E: X(E) \to Y(E)$ such that

**Important**: Graph homomorphism is NP-hard, but Markov graph morphism is a *linear feasibility* problem.

**Examples** of Markov morphisms:

- any graph homomorphism
- any probabilistic mixture of graph homomorphisms

But more exotic things can happen because mass can be "split".

**Example**: $X = $ self loop, $Y = $ directed cycle.

No graph homomorphisms $X \to Y$, but there is a unique Markov morphism $\Phi: X \to Y$:

$$ \Phi_V(*) \sim \mathrm{Unif}(Y(V)), \qquad \Phi_E(*) \sim \mathrm{Unif}(Y(E)). $$Now, the metric side of matching $\mathcal{C}$-sets.

Let $\mathbf{Met}$ be the category of Lawvere metric spaces and maps. (Note choice of morphisms.)

**Definition**. A *metric category* is a category $\mathcal{S}$ enriched in $\mathbf{Met}$, i.e., the hom-sets $\mathcal{S}(X,Y)$ are Lawvere metric spaces.

**Definition**. A morphism $f: X \to Y$ in $\mathcal{S}$ is *short* if for all morphisms $g, g': Y \to Z$ and $h, h': W \to X$,

Short morphisms of $\mathcal{S}$ form a subcategory $\mathrm{Short}(\mathcal{S})$.

Category $\mathbf{Met}$ with *supremum metric*

Short morphisms are *short maps*:

**Proposition**: For any metric category $\mathcal{S}$, $\mathrm{Short}(\mathcal{S})$ is enriched in $\mathrm{Short}(\mathbf{Met})$.

Category $\mathbf{MM}$ of mm spaces and measurable maps, with *$L^p$ metric*, $1 \leq p < \infty$:

**Proposition**: A map $f: X \to Y$ is short iff

and

$$ d_Y(f(x),f(x')) \leq d_X(x,x'), \quad \forall x,x' \in X. $$Let $\mathcal{C}$ be a finitely presented category and $\mathcal{S}$ a metric category.

**Idea**: For $X,Y \in [\mathcal{C},\mathcal{S}]$, consider *distance from naturality* of transformation $\phi: X \to Y$ at $c \in \mathcal{C}$:

**Theorem**: For any $1 \leq p \leq \infty$, a Lawvere metric on $[\mathcal{C},\mathcal{S}]$ is defined by

where

- infimum is over (unnatural) transformations with components in $\mathrm{Short}(\mathcal{S})$
- $\ell^p$ norm/sum is over a fixed, finite generating set of morphisms in $\mathcal{C}$.

**Note**: Condition that each $\phi_c \in \mathrm{Short}(\mathcal{S})$ is needed for *triangle inequality*.

Category $\mathbf{MMarkov}$ of mm spaces and Markov kernels, with *Wasserstein metric*:

Generalizes both classical $L^p$ and Wasserstein metrics:

**Proposition**: Under regularity conditions, a Markov kernel $M: X \to Y$ is short iff

and there exists $\Pi \in \mathrm{Prod}(X,Y)$ such that

$$ \int_{Y \times Y} d_Y(y,y')^p\, \Pi(dy,dy\,|\,x,x') \leq d_X(x,x')^p, \qquad \forall x,x' \in X. $$**Consequence**: Via the theorem, a Wasserstein-style metric on metric measure $\mathcal{C}$-spaces, computable by solving a *linear program*.

- Beyond $\mathcal{C}$-sets
*Sums*(coproducts) and*units*(terminal objects) are easy*Products*are less immediate

- Faster algorithms
- Needed for practical use on graphs of even moderate size
*Entropic regularization*of both theoretical and algorithmic interest

**Paper**: Hausdorff and Wasserstein metrics on graphs and other structured data, 2019. arXiv:1907.00257.