Mathematical expressions dataset

Data collection for the symbolic regression project. For related academic literature, see page on semantic representation of math.

Data sources

See my question on OpenData.SE.

Software

Selected LaTeX parsers:

  • MathJax [JavaScript]
    • The standard, wide support for LaTeX packages
  • KaTeX [JavaScript]
    • Faster alternative to MathJax, with less package support
  • LaTeXML [Perl]
    • Created by NIST to support DLMF
    • Experimental support for Content MathML
  • itex2MML [C with Ruby bindings]
  • SnuggleTex [Java]
    • Not maintained
    • Partial support for Content MathML and Maxima
    • Cited by Chien & Cheng, 2015 (doi, pdf)
  • plasTeX [Python]
    • Relatively new but works well enough for use by Stacks project

See TeX.SE for more LaTeX parsers.