Evan Patterson
Stanford University, Statistics Department
Special Session on Applied Category
AMS Fall Western Sectional Meeting, November 9-10, 2019
Is there a correspondence between two graphs?
Aflalo et al 2015; Alvarez-Melis et al 2017; Vayer et al 2018
Monge problem (1781): Given measures $\mu \in \mathrm{Prob}(X)$ and $\nu \in \mathrm{Prob}(Y)$ and cost function $c: X \times Y \to \mathbb{R}$,
$$ \mathop{\mathrm{minimize}}_{\substack{T: X \to Y: \\ T \mu = \nu}} \int_X c(x,T(x))\, \mu(dx) $$Image: Villani 2003
Monge problem is combinatorial and nonconvex.
Kantorovich's relaxation (1942): Replace deterministic map with probabilistic coupling:
$$ \mathop{\mathrm{minimize}}_{\pi \in \mathrm{Coup}(X,Y)} \int_{X \times Y} c(x,y)\, \pi(dx,dy) $$where
$$ \mathrm{Coup}(X,Y) := \{\pi \in \mathrm{Prob}(X \times Y): \mathop{\mathrm{proj}_X} \pi = \mu,\, \mathop{\mathrm{proj}_Y} \pi = \nu \}. $$New problem is convex, in fact a linear program.
When cost is a metric $d: X \times X \to \mathbb{R}$, we get the Wasserstein metric on $\mathrm{Prob}(X)$:
$$ W_p(\mu,\nu) := \inf_{\pi \in \mathrm{Coup}(\mu,\nu)} \left( \int_{X \times X} d(x,x')^p\, \pi(dx,dx') \right)^{1/p}, \qquad 1 \leq p < \infty. $$Applied to graph matching in two ways:
Featurize vertices of both graphs in common metric space, then compute Wasserstein distance.
Convert graphs into distinct metric spaces on vertices (via shortest path metric), then compute Gromov-Wasserstein distance.
Problem: Discards significant information about edges.
Goal: Construct a Wasserstein-style metric on graphs that respects both vertices and edges, in a sense to be defined.
So, this informal diagram should not commute!
In fact, no reason to restrict to graphs; generalizing to $\mathcal{C}$-sets even points the way towards a solution.
Recall: For $\mathcal{C}$ a small category, a $\mathcal{C}$-set is a functor $X: \mathcal{C} \to \mathbf{Set}$.
The category of $\mathcal{C}$-sets is the functor category $[\mathcal{C},\mathbf{Set}]$.
Example: When $\mathcal{C} =$ , a $\mathcal{C}$-set is a (directed) graph.
Example: When $\mathcal{C} = $ , a $\mathcal{C}$-set is a symmetric graph.
Nearly the same as an undirected graph.
Other examples
For applications: Attributes can be modeled in $\mathcal{C}$, to get vertex-attributed graphs, edge-attributed graphs, and so on.
A $\mathcal{C}$-set in a category $\mathcal{S}$ is a functor $X: \mathcal{C} \to \mathcal{S}$.
For us, useful categories $\mathcal{S}$ include:
Leads to $\mathcal{C}$-sets, measurable $\mathcal{C}$-spaces, measure $\mathcal{C}$-spaces, metric $\mathcal{C}$-spaces, and so on.
Explore relaxations of the notion of homomorphism (natural transformation):
In this talk, I give a sketch. A systematic development is in the paper.
A Markov kernel $M: X \to Y$ is a measurable assignment of each point $x \in X$ to a probability measure $M(x) \in \mathrm{Prob}(Y)$.
Other names for Markov kernels:
There is a category Markov of measurable spaces and Markov kernels.
Let $\mu \in \mathrm{Prob}(X)$ and $\nu \in \mathrm{Prob}(Y)$.
For any coupling $\pi \in \mathrm{Coup}(\mu,\nu)$, the disintegration (conditional probability distribution) $M: X \to Y$ satisfies
$$ \mu \cdot M = \nu. $$Conversely, for any Markov kernel $M: X \to Y$ with $\mu \cdot M = \nu$, there is a product
$$ \mu \otimes M \in \mathrm{Coup}(\mu,\nu). $$In fact, this correspondence is functorial.
Proposition (folkore?): Under regularity conditions, there is an isomorphism between
Interpretation: Markov kernels allow a "directed" version of optimal transport.
$\mathbf{Meas}$ embeds in $\mathbf{Markov}$ as the deterministic Markov kernels:
$$ \mathcal{M}: \mathbf{Meas} \hookrightarrow \mathbf{Markov}. $$Induces a relaxation functor by post-composition:
$$ \mathcal{M}_*: [\mathcal{C},\mathbf{Meas}] \to [\mathcal{C},\mathbf{Markov}]. $$Definition. A Markov morphism $X \to Y$ of measurable $\mathcal{C}$-spaces $X$ and $Y$ is a morphism $\mathcal{M}_*(X) \to \mathcal{M}_*(Y)$.
So, a Markov morphism $\Phi: X \to Y$ of graphs $X$ and $Y$ consists of Markov kernels $\Phi_V: X(V) \to Y(V)$ and $\Phi_E: X(E) \to Y(E)$ such that
Important: Graph homomorphism is NP-hard, but Markov graph morphism is a linear feasibility problem.
Examples of Markov morphisms:
But more exotic things can happen because mass can be "split".
Example: $X = $ self loop, $Y = $ directed cycle.
No graph homomorphisms $X \to Y$, but there is a unique Markov morphism $\Phi: X \to Y$:
$$ \Phi_V(*) \sim \mathrm{Unif}(Y(V)), \qquad \Phi_E(*) \sim \mathrm{Unif}(Y(E)). $$Now, the metric side of matching $\mathcal{C}$-sets.
Let $\mathbf{Met}$ be the category of Lawvere metric spaces and maps. (Note choice of morphisms.)
Definition. A metric category is a category $\mathcal{S}$ enriched in $\mathbf{Met}$, i.e., the hom-sets $\mathcal{S}(X,Y)$ are Lawvere metric spaces.
Definition. A morphism $f: X \to Y$ in $\mathcal{S}$ is short if for all morphisms $g, g': Y \to Z$ and $h, h': W \to X$,
$$ d(fg,fg') \leq d(g,g') \quad\text{and}\quad d(hf,h'f) \leq d(h,h'). $$Short morphisms of $\mathcal{S}$ form a subcategory $\mathrm{Short}(\mathcal{S})$.
Category $\mathbf{Met}$ with supremum metric
$$ d_\infty(f,g) := \sup_{x \in X} d_Y(f(x),g(x)), \qquad f,g \in \mathbf{Met}(X,Y). $$Short morphisms are short maps:
$$ d_Y(f(x),f(x')) \leq d_X(x,x'), \quad \forall x,x' \in X. $$Proposition: For any metric category $\mathcal{S}$, $\mathrm{Short}(\mathcal{S})$ is enriched in $\mathrm{Short}(\mathbf{Met})$.
Category $\mathbf{MM}$ of mm spaces and measurable maps, with $L^p$ metric, $1 \leq p < \infty$:
$$ d_p(f,g) := \left( \int_X d_Y(f(x),g(x))^p\, \mu_X(dx) \right)^{1/p}, \qquad f,g \in \mathbf{MM}(X,Y). $$Proposition: A map $f: X \to Y$ is short iff
$$ \mu_X f := \mu_X \circ f^{-1} \leq \mu_Y, $$and
$$ d_Y(f(x),f(x')) \leq d_X(x,x'), \quad \forall x,x' \in X. $$Let $\mathcal{C}$ be a finitely presented category and $\mathcal{S}$ a metric category.
Idea: For $X,Y \in [\mathcal{C},\mathcal{S}]$, consider distance from naturality of transformation $\phi: X \to Y$ at $c \in \mathcal{C}$:
Theorem: For any $1 \leq p \leq \infty$, a Lawvere metric on $[\mathcal{C},\mathcal{S}]$ is defined by
$$ d(X,Y) := \inf_{\phi: X \to Y} \sideset{}{_p}\sum_{f: c \to c'} d(Xf \cdot \phi_{c'}, \phi_c \cdot Yf), $$where
Note: Condition that each $\phi_c \in \mathrm{Short}(\mathcal{S})$ is needed for triangle inequality.
Category $\mathbf{MMarkov}$ of mm spaces and Markov kernels, with Wasserstein metric:
$$ W_p(M,N) := \inf_{\Pi \in \mathrm{Coup}(M,N)} \left( \int_{X \times Y \times Y} d_Y(y,y')^p\, \Pi(dy,dy'\,|\,x)\, \mu_X(dx) \right)^{1/p} $$Generalizes both classical $L^p$ and Wasserstein metrics:
Proposition: Under regularity conditions, a Markov kernel $M: X \to Y$ is short iff
$$ \mu_X M \leq \mu_Y $$and there exists $\Pi \in \mathrm{Prod}(X,Y)$ such that
$$ \int_{Y \times Y} d_Y(y,y')^p\, \Pi(dy,dy\,|\,x,x') \leq d_X(x,x')^p, \qquad \forall x,x' \in X. $$Consequence: Via the theorem, a Wasserstein-style metric on metric measure $\mathcal{C}$-spaces, computable by solving a linear program.
Paper: Hausdorff and Wasserstein metrics on graphs and other structured data, 2019. arXiv:1907.00257.