Word embeddings

Word embeddings map words to vectors in relatively low dimension compared to one-hot encodings.

Blog posts:

Off the Convex Path: Semantic word embeddings (1 , 2 , 3 ) (Arora et al, 2016)
Sebastian Ruder: introduction and recent trends
Omer Levy’s blog

Stanford resources:

Lecture by Chris Manning for CS 276 (slides)
Talk by Pramod Viswanath : Geometries of word embeddings (slides)

Software

Pre-trained word embeddings

word2vec (Google Code , GitHub ) (Mikolov et al, 2013; Levy & Goldberg, 2014)
GloVe : Global vectors (Pennington et al, 2014)

Literature

Influential papers

Mikolov, Sutskever, Chen, Corrado, Dean, 2013: Distributed representations of words and phrases and their compositionality (pdf, arxiv)
Pennington, Socher, Manning, 2014: GloVe: Global vector for word representations (pdf)
- Simple but effective method based on weighted SVD

Probabilistic embeddings (probability distributions, not points):

Vilnis & McCallum, 2014: Word representations via Gaussian embedding (arxiv, GitHub )
- Points replaced by Gaussian distributions, with variance capturing word specificity
- Containment of constant-density ellipsoids models entailment
Athiwaratkun & Wilson, 2017: Multimodal word distributions (pdf, arxiv)

Theory

Levy & Golberg, 2014: Neural word embedding as implicit matrix factorization (pdf, arxiv)
Arora et al, 2016: A latent variable model approach to PMI-based word embeddings (pdf, arxiv)