Relational databases

Relational (SQL) databases are the predominant model of databases, grounded in mathematical theory and enjoying massive industrial use. Other database models, collectively known as “NoSQL,” include graph databases, triple stores, and document databases.

Relation database theory

The theory of relational databases is traditionally based on the theory of relations from mathematical logic. For an introducion, see the lectures on logic and databases by Phokion Kolaitis, via the Simons Institute Logical Structures in Computation Boot Camp .

Standard theory of relational databases

  • Maier, 1983: The theory of relational databases (pdf)
    • The first textbook on relational database theory
  • Abiteboul, Hull, Vianu, 1995: Foundations of databases
    • Aka, the “Alice book”
    • The standard reference, readable but pedestrian

Relation and cylindric algebras

Connections between relational logic and relational DBs:

  • Imielinski & Lipski, 1984: The relational model of data and cylindric algebras (doi)
  • Van den Bussche, 2001: Applications of Alfred Tarski’s ideas in database theory (doi, pdf)
  • Feferman, 2006: Tarski’s influence on computer science (arxiv)

Category theory

Categorical databases program by Spivak and collaborators:

  • Spivak, 2012: Functorial data migration (doi, arxiv)
  • Spivak, 2012: Kleisli database instances (arxiv)
  • Spivak, 2013: Database queries and constraints via lifting problems (doi, arxiv)
  • Spivak & Wisnesky, 2015: Relational foundations for functorial data migration (doi, arxiv)
  • Schultz, Spivak, Vasilakopoulou, Wisnesky, 2017: Algebraic databases (pdf, arxiv)
  • Schultz & Wisnesky, 2017: Algebraic data integration (doi, arxiv)
  • Spivak & Wisnesky, 2019: Fast left Kan extensions using the Chase (pdf, slides)
    • Warning: this algorithm is encumbered by a patent

Enriched databases

The relational database model can be extended from sets to partial orders, metric spaces, or other sets with extra structure. Categorical speaking, this amounts to generalizing from set-valued functors \(C \to \mathbf{Set}\) to functors \(C \to S\) into another concrete category \(S\). The functor category \([C,S]\) is often then an enriched category. Given how natural this idea is, the literature on it is surprisingly sparse and rarely phrased in categorical language.

  • Ng, 2001: An extension of the relational data model to incorporate ordered domains (doi)
    • Ng, 1997, PhD thesis: An extension of the relational data model to incorporate ordered domains (pdf)
    • Raymond, 1996, PhD thesis: Partial-order databases (ACM )
  • Hajdinjak & Bauer, 2009: Similarity measures for relational databases (pdf)
    • Hajdinjak & Bierman, 2012: Extending relational algebra with similarities (doi)

Algorithms

Conjuctive query planning

Conjuctive queries are a restrictive but useful and important class of relational database queries, corresponding syntactically to undirected wiring diagrams and algebraically to morphisms in a bicategory of relations.

  • Abiteboul, Hull, Vianu, 1995: Foundations of databases, Sec 6.4: Computing with acyclic joins
  • Gottlob et al, 2016: Hypertree decompositions: Questions and answers (doi, slides)
    • Reviews a long line of work by the authors on tree decompositions of the hypergraph formed by a conjunctive query
    • Early reference is: Gottlob et al, 2002: Hypertree decompositions and tractable queries (doi, slides)