Categorical probability theory

This page is about how to understand probability theory, and to a lesser extent measure theory, from a categorical viewpoint. There are by now several approaches to categorical probability.

Grab bag of links to discussions on the web:

Markov kernels and Markov categories

The most significant thread, in my view, on categorical probability is about the category of Markov kernels and its axiomatization through Markov categories.

History of category of Markov kernels

The category of Markov kernels has been rediscovered many times, in different guises for different reasons. Here is an early history up to around 2000, probably incomplete:

1962
According to the nLab , William Lawvere invents the Giry monad in an appendix to a grant proposal, but does not publish it. The appendix will not become publicly available until 2012 (pdf).
1962
A higher-dimensional gluing lemma, applicable only to finite sets, is presented by N.N. Vorob’ev (doi, MathNet.ru ). Cédric Villani, following Richard Dudley, identifies this result as the origin of the gluing lemma for couplings. (The composition laws for couplings and for Markov kernels are equivalent, although categorical language is not used in optimal transport.)
1965

Independently of Lawvere, N.N. Cencov publishes, in Russian, “The categories of mathematical statistics” (MathNet.ru ). I cannot obtain the English translation published by the AMS. The Zentralblatt summary is:

The author defines the category CAP of probability distributions using as objects the set of all probability distributions over a measurable space and, as morphisms, the Markovian (transition) probabilities. Categories connected with the problems of mathematical statistics are defined and some results are stated.

The concrete category CAP is isomorphic to the category of Markov kernels, which Cencov calls the “category of statistical decisions” (Cencov, 1982, Theorem 5.2).

1972
Cencov publishes his magnum opus on mathematical statistics from the category-theoretic and differential-geometric viewpoints, the monograph Statistical Decision Rules and Optimal Inference (in Russian).
~1980
Measure-theoretic versions of the gluing lemma appear in papers, e.g., by Berkes and Phillipp (doi) and by Shortt (doi). It is probably impossible to determine the origin of the gluing lemma. As Berkes and Phillipp put it, “this lemma is used implicitly in most papers on the subject, past, present (and future!).”
1982
An English translation of Cencov’s monograph is distributed by the AMS.
1982
Michèle Giry gets ahold of Lawvere’s appendix and develops its ideas. She publishes the paper on what will be called the Giry monad.
1998
Prakash Panangaden introduces the category of Markov kernels to computer scientists, downplaying its connection with the Giry monad. Thus begins a line of work on probabilistic semantics and bisimulation involving Desharnais, Doberkat, Panangaden, and others.

Literature on Markov kernels and categories

In the category-theoretic literature, Markov kernels have also been inaptly called “stochastic relations.”

  • Panangaden, 1998: Probabilistic relations (pdf)
    • Early version of Panangaden, 1999: The category of Markov kernels (doi)
    • See also 1997 course on “Stochastic techniques in concurrency” (slides)
  • Abramsky, Blute, & Panangaden, 1999: Nuclear and trace ideals in tensored \(*\)-categories (doi, arxiv)
    • Sec. 7.2: “Stochastic kernels”, aka Markov kernels
    • Sec. 7.3: “Probabilistic relations”, different than in (Panangaden, 1998), being closer to couplings
  • Doberkat, 2007: Stochastic Relations: Foundations for Markov Transition Systems
    • Ch. 3 is similar to, but does not cite: Doberkat, 2006: Eilenberg-Moore algebras for stochastic relations (doi)
    • Sec. 5.10: Bibliographic notes on “converse relations”, citing (Abramsky, Blute, & Panangaden, 1999)
  • Voevodsky, 2008: Notes on categorical probability (online , pdf)
    • Unfinished notes published posthumously by Voevodsky’s academic executor, Dan Grayson
    • Cites Doberkat, Gregoriou, and Panangaden
  • Fritz, 2009: A presentation of the category of stochastic matrices (arxiv)
  • Fong, 2012, MS thesis: Causal theories: A categorical perspective on Bayesian networks, Ch. 2: Categorical probability theory (arxiv)
    • Nice overview of Meas and Stoch as symmetric monoidal categories
    • Warning: Sec. 2.4 claims that all isomorphisms in Stoch are deterministic (without further assumptions, they are only 0-1 measures at every point)
  • Cho & Jacobs, 2019: Disintegration and Bayesian inversion via string diagrams (doi, arxiv)
    • Interprets integration, disintegration, and Bayesian inference in string diagrams
  • Fritz, 2020: A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics (doi, arxiv)
    • Now the definitive reference for Markov categories, synthesizing and extending much previous work
    • Fritz & Rischel, 2020: The zero-one laws of Kolmogorov and Hewitt-Savage in categorical probability (doi, arxiv)
    • Fritz et al, 2023: Weakly affine monads (arxiv)
  • Parzygnat, 2020: Inverses, disintegrations, and Bayesian inversion in quantum Markov categories (arxiv)
    • A quantum Markov category is basically a \(\mathbb{Z}_2\)-graded Markov category, with the grading intended for linear and conjugate-linear maps

Probability monads

“Probability monad” is an informal term that includes the famous Giry monad , as well as related monads like the Radon monad and the Kantorovich monad. All these monads send measurable spaces to spaces of probability measures; they differ mainly in the topological restrictions placed on the spaces. Because the category of Markov kernels is the Kleisli category of the Giry monad, probability monads suggest a certain perspective on Markov kernels and other kinds of “nondeterministic maps” obtained from taking different monads.

Books and surveys

  • Riehl, 2016: Category theory in context
    • Only references in passing, as examples of monads and their algebras
    • Giry monad: Example 5.1.5 (iv)
    • Markov kernels as Kleisli category for the Giry monad: Example 5.2.10 (iv)
  • Jacobs, 2021, book draft: Structured probabilistic reasoning (pdf)
    • Not just probability, but “channels” in monads (main examples: list monad, powerset monad, multiset monad, Giry monad)

Papers

  • Giry, 1982: A categorical approach to probability theory (doi, pdf)
  • Gregoriou, 2001: A categorical approach to integral of measures and disintegration of measures (pdf)
  • Avery, 2016: Codensity and the Giry monad (doi, arxiv)
    • Inspired by Leinster, 2013: Codensity and the ultrafilter monad (arxiv, pdf)
    • Similar ideas in Sturtz, 2014: Categorical probability theory (arxiv)
  • Sturtz, 2017: The factorization of the Giry monad (arxiv)
    • In 2018, Tomas Crhak seemingly refuted the main theorem (arxiv I, arxiv II)
    • In 2019, Sturtz issued an “erratum and addendum” revising the main theorem (arxiv I, arxiv II)
  • Perrone, 2018, PhD thesis: Categorical probability and stochastic dominance in metric spaces (pdf)
    • Mainly about probability monads and stochastic orders
    • Fritz & Perrone, 2017: A probability monad as the colimit of finite powers (pdf, arxiv, n-Cat Cafe )
    • Fritz & Perrone, 2018: Bimonoidal structure of probability monads (doi, arxiv)
    • Fritz & Perrone, 2020: Stochastic order on metric spaces and the ordered Kantorovich monad (doi, arxiv)
  • Jacobs, 2018: From probability monads to commutative effectuses (doi, pdf)
    • Sec. 3: Running monad examples (the discrete probability monad, the Giry monad, the expectation monad, the probabilistic power domain monad, the Radon monad, and the Kantorovich monad)
  • Jacobs, 2020: De Finetti’s construction as a categorical limit (arxiv)

Other topics in categorical probability

Measure and measurable spaces

Some facts about universal properties:

  • Independent product of measures is not a (categorical) product, only a monoidal product (MO )
  • Measurable spaces have products; measure spaces do not (MO )
  • Measurable spaces also have coproducts (MO )
  • In fact, category Meas of measurable spaces and measurable functions is complete and cocomplete:
    • Limits as in Set, equipped with initial (coarsest) σ-algebra w.r.t. cone maps
    • Colimits as in Set, equipped with terminal (finest) σ-algebra w.r.t. cocone maps
  • Meas is not cartesian closed, because the evaluation function is never measurable (Heunen et al, 2017, Prop. 6, citing a result of Aumann)