Categorical probability theory

Probability theory and measure theory from the categorical viewpoint :

Universal properties

  • Independent product of probability measures is only a monoidal product (MO )
  • Measurable spaces have products; measure spaces do not (MO )
  • Measurable spaces also have coproducts (MO )
  • In fact, category Meas of measurable spaces and measurable functions is complete and cocomplete:
    • Limits as in Set, equipped with initial (coarsest) σ-algebra w.r.t. cone maps
    • Colimits as in Set, equipped with terminal (finest) σ-algebra w.r.t. cocone maps
  • Meas is not cartesian closed, because the evaluation function is never measurable (Heunen et al, 2017, Prop. 6, citing a result of Aumann)


The category of Markov kernels has been rediscovered many times, from different perspectives. Here is an early history, probably incomplete:

According to the nLab , William Lawvere invents the Giry monad in an appendix to a grant proposal, but does not publish it. The appendix will not become publicly available until 2012 (pdf).
A higher-dimensional gluing lemma, applicable only to finite sets, is presented by N.N. Vorob’ev (doi, ). Cédric Villani, following Richard Dudley, identifies this result as the origin of the gluing lemma for couplings. (The composition laws for couplings and for Markov kernels are equivalent, although categorical language is not used in optimal transport.)

Independently of Lawvere, N.N. Cencov publishes, in Russian, “The categories of mathematical statistics” ( ). I cannot obtain the English translation published by the AMS. The Zentralblatt summary is:

The author defines the category CAP of probability distributions using as objects the set of all probability distributions over a measurable space and, as morphisms, the Markovian (transition) probabilities. Categories connected with the problems of mathematical statistics are defined and some results are stated.

The concrete category CAP is isomorphic to the category of Markov kernels, which Cencov calls the “category of statistical decisions” (Cencov, 1982, Theorem 5.2).

Cencov publishes his magnum opus on mathematical statistics from the category-theoretic and differential-geometric viewpoints, the monograph Statistical Decision Rules and Optimal Inference (in Russian).
Measure-theoretic versions of the gluing lemma appear in papers, e.g., by Berkes and Phillipp (doi) and by Shortt (doi). It is probably impossible to determine the origin of the gluing lemma. As Berkes and Phillipp put it, “this lemma is used implicitly in most papers on the subject, past, present (and future!).”
An English translation of Cencov’s monograph is distributed by the AMS.
Michèle Giry gets ahold of Lawvere’s appendix and develops its ideas. She publishes the paper on the monad that will come to bear her name.
Prakash Panangaden introduces the category of Markov kernels to computer scientists, downplaying its connection with the Giry monad. Thus begins a line of work on probabilistic semantics and bisimulation involving Desharnais, Doberkat, Panangaden, and others.



  • Riehl, 2016: Category theory in context
    • References in passing, as examples of monads and their algebras
    • Giry monad: Example 5.1.5 (iv)
    • Markov kernels as Kleisli category for the Giry monad: Example 5.2.10 (iv)
  • Jacobs, 2019, book draft: Structured probabilistic reasoning (pdf)
    • Not just probability, but “channels” in monads (main examples: list monad, powerset monad, multiset monad, Giry monad)

Probability monads

“Probability monad” is an informal term that includes the famous Giry monad , as well as related monads like the Radon monad and the Kantorovich monad. All these monads send measurable spaces to spaces of probability measures; they differ mainly in the topological restrictions placed on the spaces.

  • Giry, 1982: A categorical approach to probability theory (doi, pdf)
  • Gregoriou, 2001: A categorical approach to integral of measures and disintegration of measures (pdf)
  • Avery, 2016: Codensity and the Giry monad (doi, arxiv)
    • Inspired by Leinster, 2013: Codensity and the ultrafilter monad (arxiv, pdf)
    • Similar ideas in Sturtz, 2014: Categorical probability theory (arxiv)
  • Sturtz, 2017: The factorization of the Giry monad (arxiv)
    • In 2018, Tomas Crhak seemingly refuted the main theorem (arxiv I, arxiv II)
    • In 2019, Sturtz issued an “erratum and addendum” revising the main theorem (arxiv I, arxiv II)
  • Perrone, 2018, PhD thesis: Categorical probability and stochastic dominance in metric spaces (pdf)
    • Mostly about probability monads on various spaces
    • Fritz & Perrone, 2017: A probability monad as the colimit of finite powers (pdf, arxiv, n-Cat Cafe )
    • Fritz & Perrone, 2018: Bimonoidal structure of probability monads (doi, arxiv)
    • Fritz & Perrone, 2018: Stochastic order on metric spaces and the ordered Kantorovich monad (arxiv)
  • Jacobs, 2018: From probability monads to commutative effectuses (doi, pdf)
    • Sec. 3: Running monad examples (the discrete probability monad, the Giry monad, the expectation monad, the probabilistic power domain monad, the Radon monad, and the Kantorovich monad)
  • Jacobs, 2020: De Finetti’s construction as a categorical limit (arxiv)

Markov kernels, aka stochastic relations

See also general literature on Markov chains and Markov kernels.

  • Panangaden, 1998: Probabilistic relations (pdf)
    • Early version of Panangaden, 1999: The category of Markov kernels (doi)
    • See also 1997 course on “Stochastic techniques in concurrency” (slides)
  • Abramsky, Blute, & Panangaden, 1999: Nuclear and trace ideals in tensored \(*\)-categories (doi, arxiv)
    • Sec. 7.2: “Stochastic kernels”, aka Markov kernels
    • Sec. 7.3: “Probabilistic relations”, different than in (Panangaden, 1998), being closer to couplings
  • Doberkat, 2007: Stochastic Relations: Foundations for Markov Transition Systems
    • Ch. 3 is similar to, but does not cite: Doberkat, 2006: Eilenberg-Moore algebras for stochastic relations (doi)
    • Sec. 5.10: Bibliographic notes on “converse relations”, citing (Abramsky, Blute, & Panangaden, 1999)
  • Voevodsky, 2008: Notes on categorical probability (online , pdf)
    • Unfinished notes published posthumously by Voevodsky’s academic executor, Dan Grayson
    • Cites Doberkat, Gregoriou, and Panangaden
  • Fritz, 2009: A presentation of the category of stochastic matrices (arxiv)
  • Fong, 2012, MS thesis: Causal theories: A categorical perspective on Bayesian networks, Ch. 2: Categorical probability theory (arxiv)
    • Nice overview of Meas and Stoch as symmetric monoidal categories
    • Warning: Sec. 2.4 claims that all isomorphisms in Stoch are deterministic (without further assumptions, they are only 0-1 measures at every point)
  • Cho & Jacobs, 2019: Disintegration and Bayesian inversion via string diagrams (doi, arxiv)
    • Interprets integration, disintegration, and Bayesian inference in string diagrams
  • Fritz, 2019: A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics (arxiv)
    • Fritz & Rischel, 2019: The zero-one laws of Kolmogorov and Hewitt-Savage in categorical probability (arxiv)
  • Parzygnat, 2020: Inverses, disintegrations, and Bayesian inversion in quantum Markov categories (arxiv)
    • A quantum Markov category is basically a \(\mathbb{Z}_2\)-graded Markov category, with the grading intended for linear and conjugate-linear maps