# Optimal transport

## Mathematics

**Books**

- Villani, 2003:
*Topics in optimal transportation*(doi) - Villani, 2009:
*Optimal transport: Old and new*(doi, pdf)- Mathematically formidable and physically massive, but surprisingly readable
- My preferred reference for the theory

- Santambrogio, 2015:
*Optimal transport for applied mathematicians*(doi)- A fine book, but misnamed: it is mostly pure mathematics
- An exception is Chapter 6: Numerical methods

- Rachev & Rüschendorf, 1998:
*Mass transportation problems*, Volume I: Theory (doi) and Volume II: Applications (doi)- The standard reference until the publication of Villani’s books

**Topical surveys**

- De Philippis and Figalli, 2014: The Monge-Ampère equation and its link to
optimal transportation (doi)
- Mentioned in a curious MO question about uses of higher-order derivatives

**Unbalanced optimal transport**

Sometimes it is too much to ask that the marginal measures be preserved, which
in particular assumes they have equal mass. In *unbalanced optimal transport*,
the measure preservation assumption is relaxed.

- Chizat et al, 2018: Unbalanced optimal transport: Dynamic and Kantorovich formulations (doi, arxiv)
- Figalli, 2009: The optimal partial transport problem (doi, pdf)

## Computation

**Books and surveys**

- Peyré & Cuturi, 2019:
*Computational optimal transport*(doi, arxiv)- The friendliest book on optimal transport, not just for computational issues

- Solomon, 2018: Optimal transport on discrete domains (arxiv, pdf)
- From the 2018 AMS Short Course on Discrete Differential Geometry (pdf)
- For a general audience: Solomon, 2017: Computational optimal transport (doi)

**Fast computation via entropic regularization**

Chapter 4 of Peyré & Cuturi’s book is a good overview.

- Cuturi, 2013: Sinkhorn distances: Lightspeed computation of optimal transport
(arxiv, pdf)
- The important paper that first applied the Sinkhorn-Knopp algorithm to solve regularized optimal transport
- Summarized in: Peyré & Cuturi, 2019, Sec. 4.2: Sinkhorn’s algorithm and its convergence

- Altschuler, Weed, Rigollet, 2017: Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration (arxiv, pdf)
- Blanchet et al, 2018: Towards optimal running times for optimal transport (arxiv)
- Lin, Ho, Jordan, 2019: On efficient optimal transport: An analysis of greedy
and accelerated mirror descent algorithms (arxiv)
- Convergence analysis of greedy variant of Sinkhorn algorithms, horribly named the “Greenkhorn algorithm”

## Statistical inference

What are the statistical properties of optimal transport between random measures, such as empirical distributions? Such questions are just starting to be answered:

- Panaretos & Zemel, 2019: Statistical aspects of Wasserstein distances (doi,
arxiv)
- Review paper, citing and belonging to the same series as: Wang, Chiou, Müller, 2016: Functional data analysis (doi)
- See especially Sec 4: Optimal transport as the object of inference, which is mainly about Fréchet means in Wasserstein space
- Zemel & Panaretos, 2019: Fréchet means and Procrustes analysis in Wasserstein space (doi, arxiv)

- Peyré & Cuturi, 2019:
*Computational optimal transport*, Sec 9.4: Minimum Kantorovich estimators - Bassetti, Bodini, Regazzini, 2006: On minimum Kantorovich distance estimators
(doi)
- Studies existence, measurability, and consistency of “estimators defined as minimizers of Kantorovich distances between statistical models and empirical distributions”
- Bassetti & Regazzini, 2006: Asymptotic properties and robustness of minimum dissimilarity estimators of location-scale parameters (doi)

- Bernton, Jacob, Gerber, Robert, 2019: On parameter estimation with the
Wasserstein distance (pdf, supplementary )
- Extends results of Bassetti et al, 2006 to misspecified models and non-i.i.d. data

Phillipe Rigollet and his students are doing much interesting work on the statistics of optimal transport:

- Rigollet & Weed, 2018: Entropic optimal transport is maximum-likelihood deconvolution (doi, arxiv)
- Rigollet & Weed, 2019: Uncoupled isotonic regression via minimum Wasserstein deconvolution (doi, arxiv)
- Forrow et al, 2018: Statistical optimal transport via factored couplings
(arxiv)
- Previously called: “Statistical optimal transport via geodesic hubs”

Other applications in ML and statistics include:

- Bonneel, Peyré, Cuturi, 2016: Wasserstein barycentric coordinates: Histogram regression using optimal transport (doi, online )
- Genevay, Peyré, Cuturi: GAN and VAE from an optimal transport point of view (arxiv)
- Schmitz et al, 2018: Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning (doi, arxiv)
- Bernton, Jacob, Gerber, Robert, 2019: Approximate Bayesian computation with the Wasserstein distance (doi, arxiv)