Differential privacy

Literature

See also literature on stability.

Surveys

  • Dwork, 2008: Differential privacy: a survey of results (pdf)
  • Dwork, 2011: A firm foundation for private data analysis (doi, pdf)
  • Dwork & Roth, 2014: Algorithmic Foundations of Differential Privacy (pdf)

Adaptive data analysis

  • Dwork, Feldman, Hardt, Pitassi, Reingold, Roth, 2015: Preserving statistical validity in adaptive data analysis (arxiv)
  • Same authors, 2015: Generalization in adaptive data analysis and holdout reuse (arxiv)
    • Uses theory developed in first paper and other differential privacy literature
    • Baby version in Science, 2015: The resuable holdout: preserving validity in adaptive data analysis (doi)
  • Blum & Hardt, 2015: The Ladder: A reliable leaderboard for ML competitions (arxiv)
    • Similar in spirit to the reusable holdout
  • Bassily, Nissim, Smith, Steinke, Stemmer, Ullman, 2016: Algorithmic stability for adaptive data analysis (arxiv)
    • Strengthens and simplifies analysis of Dwork et al
    • Subsumes two previous papers (1, 2)
  • Russo & Zou, 2016: Controlling bias in adaptive data analysis using information theory (arxiv, JMLR )
  • Bassily & Freund, 2016: Typical stability (arxiv)
  • Smith, 2017: Information, privacy, and stability in adaptive data analysis (arxiv)

Local sensitivity

  • Dwork & Roth, 2014, Chapter 7: When worst-case sensitivity is atypical
  • Nissim, Raskhodnikova, Smith, 2007: Smooth sensitivity and sampling in private data analysis (doi, pdf)
  • Thakurta & Smith, 2013: Differentially private feature selection via stability arguments, and the robustness of the lasso (pdf) (see also stability)

Algorithms

  • Nikolov, Talwar, Zhang, 2013: The geometry of differential privacy: the sparse and approximate cases (doi, arxiv)
  • Hsu, Roth, Roughgarden, Ullman, 2014: Privately solving linear programs (arxiv)
  • Dimitrakakis et al, 2016: Bayesian differential privacy through posterior sampling (arxiv)

Software

  • Thresholdout
    • Source code for experiments in Science paper
    • Win-Vector blog post including R code and some interesting commentary:
      • “Nina and I… came to the conclusion that reserving some test data for step variable scoring is usually worse than the standard method of using all our training data to run stepwise regression”
  • Ladder.jl : Julia implementation of (Blum & Hardt, 2015)