# Optimal transport

## Mathematics

**Books**

- Villani, 2003:
*Topics in optimal transportation*[doi] - Villani, 2009:
*Optimal transport: Old and new*[doi, pdf]- Mathematically formidable and physically massive, but surprisingly readable
- My preferred reference for the theory

- Santambrogio, 2015:
*Optimal transport for applied mathematicians*[doi]- A fine book, but misnamed: it is mostly pure mathematics
- An exception is Chapter 6: Numerical methods

- Rachev & Rüschendorf, 1998:
*Mass transportation problems*, Volume I: Theory [doi] and Volume II: Applications [doi]- The standard reference until the publication of Villani's books

**Topical surveys**

- De Philippis and Figalli, 2014: The Monge-Ampère equation and its link to
optimal transportation [doi]
- Mentioned in a curious MO question about uses of higher-order derivatives

**Unbalanced optimal transport**

Sometimes it is too much to ask that the marginal measures be preserved, which
in particular assumes they have equal mass. In *unbalanced optimal transport*,
the measure preservation assumption is relaxed.

- Chizat et al, 2018: Unbalanced optimal transport: Dynamic and Kantorovich formulations [doi, arxiv]
- Figalli, 2009: The optimal partial transport problem [doi, pdf]

## Computation

**Books and surveys**

- Peyré & Cuturi, 2019:
*Computational optimal transport*[doi, arxiv]- The friendliest book on optimal transport, not just for computational issues

- Solomon, 2018: Optimal transport on discrete domains [arxiv, pdf]
- From the 2018 AMS Short Course on Discrete Differential Geometry [pdf]
- For a general audience: Solomon, 2017: Computational optimal transport [doi]

**Fast computation via entropic regularization**

Chapter 4 of Peyré & Cuturi's book is a good overview.

- Cuturi, 2013: Sinkhorn distances: Lightspeed computation of optimal transport
[arxiv, pdf]
- The important paper that first applied the Sinkhorn-Knopp algorithm to solve regularized optimal transport
- Summarized in: Peyré & Cuturi, 2019, Sec. 4.2: Sinkhorn’s algorithm and its convergence

- Altschuler, Weed, Rigollet, 2017: Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration [arxiv, pdf]
- Blanchet et al, 2018: Towards optimal running times for optimal transport [arxiv]
- Lin, Ho, Jordan, 2019: On efficient optimal transport: An analysis of greedy
and accelerated mirror descent algorithms [arxiv]
- Convergence analysis of greedy variant of Sinkhorn algorithms, horribly named the "Greenkhorn algorithm"

## Statistical inference

What are the statistical properties of optimal transport between random measures, such as empirical distributions? Such questions are just starting to be answered:

- Panaretos & Zemel, 2019: Statistical aspects of Wasserstein distances [doi,
arxiv]
- Review paper, citing and belonging to the same series as: Wang, Chiou, Müller, 2016: Functional data analysis [doi]
- See especially Sec 4: Optimal transport as the object of inference, which is mainly about Fréchet means in Wasserstein space
- Zemel & Panaretos, 2019: Fréchet means and Procrustes analysis in Wasserstein space [doi, arxiv]

- Peyré & Cuturi, 2019:
*Computational optimal transport*, Sec 9.4: Minimum Kantorovich estimators - Bassetti, Bodini, Regazzini, 2006: On minimum Kantorovich distance estimators
[doi]
- Studies existence, measurability, and consistency of "estimators defined as minimizers of Kantorovich distances between statistical models and empirical distributions"
- Bassetti & Regazzini, 2006: Asymptotic properties and robustness of minimum dissimilarity estimators of location-scale parameters [doi]

- Bernton, Jacob, Gerber, Robert, 2019: On parameter estimation with the
Wasserstein distance [pdf, supplementary ]
- Extends results of Bassetti et al, 2006 to misspecified models and non-i.i.d. data

Phillipe Rigollet and his students are doing much interesting work on the statistics of optimal transport:

- Rigollet & Weed, 2018: Entropic optimal transport is maximum-likelihood deconvolution [doi, arxiv]
- Rigollet & Weed, 2019: Uncoupled isotonic regression via minimum Wasserstein deconvolution [doi, arxiv]
- Forrow et al, 2018: Statistical optimal transport via factored couplings
[arxiv]
- Previously called: "Statistical optimal transport via geodesic hubs"

Other applications in ML and statistics include:

- Bonneel, Peyré, Cuturi, 2016: Wasserstein barycentric coordinates: Histogram regression using optimal transport [doi, online ]
- Genevay, Peyré, Cuturi: GAN and VAE from an optimal transport point of view [arxiv]
- Schmitz et al, 2018: Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning [doi, arxiv]
- Bernton, Jacob, Gerber, Robert, 2019: Approximate Bayesian computation with the Wasserstein distance [doi, arxiv]