Optimal transport



  • Villani, 2003: Topics in optimal transportation (doi)
  • Villani, 2009: Optimal transport: Old and new (doi, pdf)
    • Mathematically formidable and physically massive, but surprisingly readable
    • My preferred reference for the theory
  • Santambrogio, 2015: Optimal transport for applied mathematicians (doi)
    • A fine book, but misnamed: it is mostly pure mathematics
    • An exception is Chapter 6: Numerical methods
  • Rachev & Rüschendorf, 1998: Mass transportation problems, Volume I: Theory (doi) and Volume II: Applications (doi)
    • The standard reference until the publication of Villani’s books

Topical surveys

  • De Philippis and Figalli, 2014: The Monge-Ampère equation and its link to optimal transportation (doi)

Unbalanced optimal transport

Sometimes it is too much to ask that the marginal measures be preserved, which in particular assumes they have equal mass. In unbalanced optimal transport, the measure preservation assumption is relaxed.

  • Chizat et al, 2018: Unbalanced optimal transport: Dynamic and Kantorovich formulations (doi, arxiv)
    • Computational sequel: Chizat et al, 2018: Scaling algorithms for unbalanced optimal transport problems (doi, arxiv)
    • Exposited in: Peyré & Cuturi, 2019, Sec. 10.2: Unbalanced optimal transport
    • Proposed independently in:
      • Liero, Mielke, Savaré, 2018: Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures (doi, arixv)
      • Liero, Mielke, Savaré, 2016: Optimal transport in competition with reaction: The Hellinger-Kantorovich distance and geodesic curves (doi, arxiv)
  • Figalli, 2009: The optimal partial transport problem (doi, pdf)
    • Generalizes: Caffarelli & McCann, 2010: Free boundaries in optimal transport and Monge-Ampère obstacle problems (doi, pdf)


Books and surveys

Fast computation via entropic regularization

Chapter 4 of Peyré & Cuturi’s book is a good overview.

  • Cuturi, 2013: Sinkhorn distances: Lightspeed computation of optimal transport (arxiv, pdf)
    • The important paper that first applied the Sinkhorn-Knopp algorithm to solve regularized optimal transport
    • Summarized in: Peyré & Cuturi, 2019, Sec. 4.2: Sinkhorn’s algorithm and its convergence
  • Altschuler, Weed, Rigollet, 2017: Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration (arxiv, pdf)
    • First theoretical guarantees for Cuturi’s algorithm
    • Summarized in: Peyré & Cuturi, 2019, Remark 4.6
    • Improved by: Dvurechensky et al, 2018: Computational optimal transport: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm (arxiv, pdf)
  • Blanchet et al, 2018: Towards optimal running times for optimal transport (arxiv)
  • Lin, Ho, Jordan, 2019: On efficient optimal transport: An analysis of greedy and accelerated mirror descent algorithms (arxiv)
    • Convergence analysis of greedy variant of Sinkhorn algorithms, horribly named the “Greenkhorn algorithm”

Statistical inference

What are the statistical properties of optimal transport between random measures, such as empirical distributions? Such questions are just starting to be answered:

  • Panaretos & Zemel, 2019: Statistical aspects of Wasserstein distances (doi, arxiv)
    • Review paper, citing and belonging to the same series as: Wang, Chiou, Müller, 2016: Functional data analysis (doi)
    • See especially Sec 4: Optimal transport as the object of inference, which is mainly about Fréchet means in Wasserstein space
    • Zemel & Panaretos, 2019: Fréchet means and Procrustes analysis in Wasserstein space (doi, arxiv)
  • Peyré & Cuturi, 2019: Computational optimal transport, Sec 9.4: Minimum Kantorovich estimators
    • Genevay, Peyré, Cuturi, 2018: Learning generative models with Sinkhorn divergences (arxiv, pdf)
  • Bassetti, Bodini, Regazzini, 2006: On minimum Kantorovich distance estimators (doi)
    • Studies existence, measurability, and consistency of “estimators defined as minimizers of Kantorovich distances between statistical models and empirical distributions”
    • Bassetti & Regazzini, 2006: Asymptotic properties and robustness of minimum dissimilarity estimators of location-scale parameters (doi)
  • Bernton, Jacob, Gerber, Robert, 2019: On parameter estimation with the Wasserstein distance (pdf, supplementary )
    • Extends results of Bassetti et al, 2006 to misspecified models and non-i.i.d. data

Phillipe Rigollet and his students are doing much interesting work on the statistics of optimal transport:

  • Rigollet & Weed, 2018: Entropic optimal transport is maximum-likelihood deconvolution (doi, arxiv)
  • Rigollet & Weed, 2019: Uncoupled isotonic regression via minimum Wasserstein deconvolution (doi, arxiv)
  • Forrow et al, 2018: Statistical optimal transport via factored couplings (arxiv)
    • Previously called: “Statistical optimal transport via geodesic hubs”

Other applications in ML and statistics include:

  • Bonneel, Peyré, Cuturi, 2016: Wasserstein barycentric coordinates: Histogram regression using optimal transport (doi, online )
  • Genevay, Peyré, Cuturi: GAN and VAE from an optimal transport point of view (arxiv)
    • Perspective on recent, very popular papers from deep learning community
    • Arjovsky, Chintala, Bottou, 2017: Wasserstein generative adversarial networks (arxiv, pdf)
    • Tolstikhin, Bousquet, Gelly, Schoelkopf, 2018: Wasserstein auto-encoders (arxiv, reviews )
  • Schmitz et al, 2018: Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning (doi, arxiv)
  • Bernton, Jacob, Gerber, Robert, 2019: Approximate Bayesian computation with the Wasserstein distance (doi, arxiv)