# Categorical probability theory

Probability theory and measure theory from the categorical viewpoint :

- David Corfield, via n-Cat Cafe [2007a , 2007b , 2010 ]
- Mark Meckes with "A categorical look at random variables", via n-Cat Cafe [2018 ]
- MathOverflow questions [1 , 2 ]

## Universal properties

- Independent product of probability measures is only a
*monoidal product*[MO ] - Measurable spaces have
*products*; measure spaces do not [MO ] - Measurable spaces also have
*coproducts*[MO ] - In fact, category
**Meas**of measurable spaces and measurable functions is*complete*and*cocomplete*:*Limits*as in**Set**, equipped with initial (coarsest) σ-algebra w.r.t. cone maps*Colimits*as in**Set**, equipped with terminal (finest) σ-algebra w.r.t. cocone maps

**Meas**is not*cartesian closed*, because the evaluation function is never measurable (Heunen et al, 2017, Prop. 6, citing a result of Aumann)

## History

The category of Markov kernels has been rediscovered many times, from different perspectives. Here is an early history, probably incomplete:

- 1962
- According to the nLab , William Lawvere invents the Giry monad in an appendix to a grant proposal, but does not publish it. The appendix will not become publicly available until 2012 [pdf].
- 1962
- A higher-dimensional gluing lemma, applicable only to finite sets, is presented by N.N. Vorob'ev [doi, MathNet.ru ]. Cédric Villani, following Richard Dudley, identifies this result as the origin of the gluing lemma for couplings. (The composition laws for couplings and for Markov kernels are equivalent, although categorical language is not used in optimal transport.)
- 1965
Independently of Lawvere, N.N. Cencov publishes, in Russian, "The categories of mathematical statistics" [MathNet.ru ]. I cannot obtain the English translation published by the AMS. The Zentralblatt summary is:

The author defines the category

**CAP**of probability distributions using as objects the set of all probability distributions over a measurable space and, as morphisms, the Markovian (transition) probabilities. Categories connected with the problems of mathematical statistics are defined and some results are stated.The concrete category

**CAP**is isomorphic to the category of Markov kernels, which Cencov calls the "category of statistical decisions" (Cencov, 1982, Theorem 5.2).- 1972
- Cencov publishes his magnum opus on mathematical statistics from the
category-theoretic and differential-geometric viewpoints, the
monograph
*Statistical Decision Rules and Optimal Inference*(in Russian). - ~1980
- Measure-theoretic versions of the gluing lemma appear in papers, e.g., by Berkes and Phillipp [doi] and by Shortt [doi]. It is probably impossible to determine the origin of the gluing lemma. As Berkes and Phillipp put it, "this lemma is used implicitly in most papers on the subject, past, present (and future!)."
- 1982
- An English translation of Cencov's monograph is distributed by the AMS.
- 1982
- Michèle Giry gets ahold of Lawvere's appendix and develops its ideas. She publishes the paper on the monad that will come to bear her name.
- 1998
- Prakash Panangaden introduces the category of Markov kernels to computer scientists, downplaying its connection with the Giry monad. Thus begins a line of work on probabilistic semantics and bisimulation involving Desharnais, Doberkat, Panangaden, and others.

## Literature

**Introductory**

- Riehl, 2016:
*Category Theory in Context*- References in passing, as examples of monads and their algebras
- Giry monad: Example 5.1.5 (iv)
- Markov kernels as Kleisli category for the Giry monad: Example 5.2.10 (iv)

**Probability monads**

"Probability monad" is an informal term that includes the famous Giry monad , as well as related monads like the Radon monad and the Kantorovich monad. All these monads send measurable spaces to spaces of probability measures; they differ mainly in the topological restrictions placed on the spaces.

- Giry, 1982: A categorical approach to probability theory [doi, pdf]
- Gregoriou, 2001: A categorical approach to integral of measures and disintegration of measures [pdf]
- Avery, 2016: Codensity and the Giry monad [doi, arxiv]
- Sturtz, 2017: The factorization of the Giry monad [arxiv]
- Perrone, 2018, PhD thesis: Categorical probability and stochastic dominance in
metric spaces [pdf]
- Mostly about probability monads on various spaces
- Fritz & Perrone, 2017: A probability monad as the colimit of finite powers [pdf, arxiv, n-Cat Cafe ]
- Fritz & Perrone, 2018: Bimonoidal structure of probability monads [doi, arxiv]
- Fritz & Perrone, 2018: Stochastic order on metric spaces and the ordered Kantorovich monad [arxiv]

- Jacobs, 2018: From probability monads to commutative effectuses [doi, pdf]
- Sec. 3: Running monad examples (the discrete probability monad, the Giry monad, the expectation monad, the probabilistic power domain monad, the Radon monad, and the Kantorovich monad)

**Markov kernels**, aka stochastic relations

See also general literature on Markov chains and Markov kernels.

- Panangaden, 1998: Probabilistic relations [pdf]
- Abramsky, Blute, & Panangaden, 1999: Nuclear and trace ideals in tensored
\(*\)-categories [doi, arxiv]
- Sec. 7.2: "Stochastic kernels", aka Markov kernels
- Sec. 7.3: "Probabilistic relations",
*different*than in (Panangaden, 1998), being closer to couplings

- Doberkat, 2007:
*Stochastic Relations: Foundations for Markov Transition Systems*- Ch. 3 is similar to, but does not cite: Doberkat, 2006: Eilenberg-Moore algebras for stochastic relations [doi]
- Sec. 5.10: Bibliographic notes on "converse relations", citing (Abramsky, Blute, & Panangaden, 1999)

- Voevodsky, 2008: Notes on categorical probability [online , pdf]
- Unfinished notes published posthumously by Voevodsky's academic executor, Dan Grayson
- Cites Doberkat, Gregoriou, and Panangaden

- Fong, 2012, MS thesis: Causal theories: A categorical perspective on Bayesian
networks, Ch. 2: Categorical probability theory [arxiv]
- Nice overview of
**Meas**and**Stoch**as symmetric monoidal categories - Sec. 2.4: Claims that all isomorphisms in
**Stoch**are deterministic (actually, that they are 0-1 measures at every point, which is weaker without topological assumptions)

- Nice overview of
- Fritz, 2009: A presentation of the category of stochastic matrices [arxiv]
- Cho & Jacobs, 2019: Disintegration and Bayesian inversion via string diagrams
[doi, arxiv]
- Interprets integration, disintegration, and Bayesian inference in string diagrams

- Fritz, 2019: A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics [arxiv]

**Other**