Machine Representation of Data Analyses

Towards a platform for collaborative data science

Evan Patterson
Statistics Department, Stanford University

Ioana Baldini, Aleksandra Mojsilovic, Kush R. Varshney
IBM T.J. Watson Research Center

AAAI 2017 Spring Symposium: AI for the Social Good

Challenges of data-intensive social good

Applying AI and data science to social good is exciting,
but there are special challenges:

  • Not a "closed world"
  • Collaboration is essential:
    • AI researchers
    • Data scientists
    • Subject-matter experts
    • Policymakers
    • Philanthropists
    • ...

Case study: Accelerated Cure Project

  • Mission: Stimulate MS research through open-access data repository
  • Challenge: How to share and summarize data analyses?

Similar challenges in any social good enterprise involving complex, data-driven questions

A new kind of data science platform?

Our motivation: create a cloud platform for collaborative data science

Equipped with AI features such as:

  • Recommend relevant data analyses and datasets
  • Identify similar work on a given dataset
  • Organize analyses in a given domain
  • Evaluate analyses using appropriate metrics

Machine representation of data analysis

These features require a rich, machine-interpretable representation of platform content

Our contribution in one line:

Automatically extract a dataflow representation of a data analysis, which is interpretable by machines

Example: Exploratory data analysis for ACP

Example: Dataflow graph


Methodology

Our system is based on dynamic program analysis. At highest level, three steps:

  1. Execute and trace program, getting a DAG of function calls
  2. Annotate function calls and objects using an annotation database for statistical software
  3. Align graph with knowledge base of data analysis concepts

Excerpt of clustering methods in knowledge base

Machine-assisted social good

A dream: a knowledge ecosystem that is fully

  • open
  • online
  • ontologically integrated

If realized, could initiate a paradigm shift for social good, fundamentally changing the way that stakeholders collaborate and share knowledge.