Statistical distances

How do we measure the distance between probability distributions?

Methods

Metrics

Divergences (non-metric)

  • KL divergence (aka relative entropy)
  • Rényi divergence (aka-divergence) (notes)
  • \(\chi^2\)-distance

Literature

General

  • Rachev, 1991: Probability metrics and the stability of stochastic models
    • Villani: “For the taxonomy of probability metrics and their history, the unavoidable reference is the monograph by Rachev, which lists dozens and dozens of metrics together with their main properties and applications. (Many of them are variants… of the Wasserstein and Lévy-Prokhorov metrics.)”
  • Basu, Shioya, Park, 2011: Statistical inference: The minimum distance approach (doi), Chapter 2: Statistical distances
  • Gibbs & Su, 2002: On choosing and bounding probability metrics (doi, arxiv)
    • A good place to start
    • The relationship diagram in Figure 1 is especially helpful
  • Cha, 2007: Comprehensive survey on distance/similarity measures between PDFs (pdf)
  • Basseville, 1989: Distance measures for signal processing and pattern recognition (doi)
  • Basseville, 2013: Divergence measures for statistical data processing (doi)

Wasserstein distance

The literature on Wasserstein distances is simply enormous, spanning probability, statistics, optimal transport, and other branches of pure and applied mathematics. Here are a few choice references:

  • Villani, 2003: Topics in optimal transportation, Chapter 7: The metric side of optimal transportation
  • Villani, 2009: Optimal transport: Old and new, Chapter 6: The Wasserstein distances

For more references, see optimal transport.