# Categorical probability theory

Probability theory and measure theory from the categorical viewpoint :

• David Corfield, via n-Cat Cafe [2007a , 2007b , 2010 ]
• Mark Meckes with "A categorical look at random variables", via n-Cat Cafe [2018 ]
• MathOverflow questions [1 , 2 ]

## Universal properties

• Independent product of probability measures is only a monoidal product [MO ]
• Measurable spaces have products; measure spaces do not [MO ]
• Measurable spaces also have coproducts [MO ]
• In fact, category Meas of measurable spaces and measurable functions is complete and cocomplete:
• Limits as in Set, equipped with initial (coarsest) σ-algebra w.r.t. cone maps
• Colimits as in Set, equipped with terminal (finest) σ-algebra w.r.t. cocone maps
• Meas is not cartesian closed, because the evaluation function is never measurable (Heunen et al, 2017, Prop. 6, citing a result of Aumann)

## History

The category of Markov kernels has been rediscovered many times, from different perspectives. Here is an early history, probably incomplete:

1962
According to the nLab , William Lawvere invents the Giry monad in an appendix to a grant proposal, but does not publish it. The appendix will not become publicly available until 2012 [pdf].
1962
A higher-dimensional gluing lemma, applicable only to finite sets, is presented by N.N. Vorob'ev [doi, MathNet.ru ]. Cédric Villani, following Richard Dudley, identifies this result as the origin of the gluing lemma for couplings. (The composition laws for couplings and for Markov kernels are equivalent, although categorical language is not used in optimal transport.)
1965

Independently of Lawvere, N.N. Cencov publishes, in Russian, "The categories of mathematical statistics" [MathNet.ru ]. I cannot obtain the English translation published by the AMS. The Zentralblatt summary is:

The author defines the category CAP of probability distributions using as objects the set of all probability distributions over a measurable space and, as morphisms, the Markovian (transition) probabilities. Categories connected with the problems of mathematical statistics are defined and some results are stated.

The concrete category CAP is isomorphic to the category of Markov kernels, which Cencov calls the "category of statistical decisions" (Cencov, 1982, Theorem 5.2).

1972
Cencov publishes his magnum opus on mathematical statistics from the category-theoretic and differential-geometric viewpoints, the monograph Statistical Decision Rules and Optimal Inference (in Russian).
~1980
Measure-theoretic versions of the gluing lemma appear in papers, e.g., by Berkes and Phillipp [doi] and by Shortt [doi]. It is probably impossible to determine the origin of the gluing lemma. As Berkes and Phillipp put it, "this lemma is used implicitly in most papers on the subject, past, present (and future!)."
1982
An English translation of Cencov's monograph is distributed by the AMS.
1982
Michèle Giry gets ahold of Lawvere's appendix and develops its ideas. She publishes the paper on the monad that will come to bear her name.
1998
Prakash Panangaden introduces the category of Markov kernels to computer scientists, downplaying its connection with the Giry monad. Thus begins a line of work on probabilistic semantics and bisimulation involving Desharnais, Doberkat, Panangaden, and others.

## Literature

Introductory

• Riehl, 2016: Category Theory in Context
• References in passing, as examples of monads and their algebras
• Giry monad: Example 5.1.5 (iv)
• Markov kernels as Kleisli category for the Giry monad: Example 5.2.10 (iv)

"Probability monad" is an informal term that includes the famous Giry monad , as well as related monads like the Radon monad and the Kantorovich monad. All these monads send measurable spaces to spaces of probability measures; they differ mainly in the topological restrictions placed on the spaces.

• Giry, 1982: A categorical approach to probability theory [doi, pdf]
• Gregoriou, 2001: A categorical approach to integral of measures and disintegration of measures [pdf]
• Avery, 2016: Codensity and the Giry monad [doi, arxiv]
• Inspired by Leinster, 2013: Codensity and the ultrafilter monad [arxiv, pdf]
• Similar ideas in Sturtz, 2014: Categorical probability theory [arxiv]
• Sturtz, 2017: The factorization of the Giry monad [arxiv]
• In 2018, Tomas Crhak seemingly refuted the main theorem [arxiv I, arxiv II]
• In 2019, Sturtz issued an "erratum and addendum" revising the main theorem [arxiv I, arxiv II]
• Perrone, 2018, PhD thesis: Categorical probability and stochastic dominance in metric spaces [pdf]
• Fritz & Perrone, 2017: A probability monad as the colimit of finite powers [pdf, arxiv, n-Cat Cafe ]
• Fritz & Perrone, 2018: Bimonoidal structure of probability monads [doi, arxiv]
• Fritz & Perrone, 2018: Stochastic order on metric spaces and the ordered Kantorovich monad [arxiv]
• Jacobs, 2018: From probability monads to commutative effectuses [doi, pdf]

Markov kernels, aka stochastic relations

• Panangaden, 1998: Probabilistic relations [pdf]
• Early version of Panangaden, 1999: The category of Markov kernels [doi]
• Abramsky, Blute, & Panangaden, 1999: Nuclear and trace ideals in tensored $$*$$-categories [doi, arxiv]
• Sec. 7.2: "Stochastic kernels", aka Markov kernels
• Sec. 7.3: "Probabilistic relations", different than in (Panangaden, 1998), being closer to couplings
• Doberkat, 2007: Stochastic Relations: Foundations for Markov Transition Systems
• Ch. 3 is similar to, but does not cite: Doberkat, 2006: Eilenberg-Moore algebras for stochastic relations [doi]
• Sec. 5.10: Bibliographic notes on "converse relations", citing (Abramsky, Blute, & Panangaden, 1999)
• Voevodsky, 2008: Notes on categorical probability [online , pdf]
• Unfinished notes published posthumously by Voevodsky's academic executor, Dan Grayson
• Cites Doberkat, Gregoriou, and Panangaden
• Fong, 2012, MS thesis: Causal theories: A categorical perspective on Bayesian networks, Ch. 2: Categorical probability theory [arxiv]
• Nice overview of Meas and Stoch as symmetric monoidal categories
• Sec. 2.4: Claims that all isomorphisms in Stoch are deterministic (actually, that they are 0-1 measures at every point, which is weaker without topological assumptions)
• Fritz, 2009: A presentation of the category of stochastic matrices [arxiv]
• Cho & Jacobs, 2019: Disintegration and Bayesian inversion via string diagrams [doi, arxiv]
• Interprets integration, disintegration, and Bayesian inference in string diagrams
• Fritz, 2019: A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics [arxiv]

Other