Word embeddings

Word embeddings map words to vectors in relatively low dimension compared to one-hot encodings.

Blog posts:

Stanford resources:

Software

Pre-trained word embeddings

  • word2vec (Google Code , GitHub ) (Mikolov et al, 2013; Levy & Goldberg, 2014)
  • GloVe : Global vectors (Pennington et al, 2014)

Literature

Influential papers

  • Mikolov, Sutskever, Chen, Corrado, Dean, 2013: Distributed representations of words and phrases and their compositionality (pdf, arxiv)
  • Pennington, Socher, Manning, 2014: GloVe: Global vector for word representations (pdf)
    • Simple but effective method based on weighted SVD

Probabilistic embeddings (probability distributions, not points):

  • Vilnis & McCallum, 2014: Word representations via Gaussian embedding (arxiv, GitHub )
    • Points replaced by Gaussian distributions, with variance capturing word specificity
    • Containment of constant-density ellipsoids models entailment
  • Athiwaratkun & Wilson, 2017: Multimodal word distributions (pdf, arxiv)

Theory

  • Levy & Golberg, 2014: Neural word embedding as implicit matrix factorization (pdf, arxiv)
  • Arora et al, 2016: A latent variable model approach to PMI-based word embeddings (pdf, arxiv)