Book review: Dynamical Biostatistical Models by Commenges and Jacqmin-Gadda

Published

December 29, 2020

Most statistical models are purely phenomenological, which is a polite way of saying that they are made up. While such models may draw on theoretical and commonsensical ideas, their mathematical forms are not derived from existing principles in any precise way, but are merely posited on the basis of goodness of fit to data, ease of interpretation, analytical and computational tractability, and other practical considerations. The statistical paradigm of scientific modeling has the advantage of being widely applicable, since it does not depend on a deep understanding of the system under study. In many fields of social science, whose systems defy neat theoretical explanation, statistical modeling is the dominant form of scientific modeling.

Especially in the physical and life sciences, there is another tradition, much older than the statistical paradigm,¹ that puts mechanistic models at the fore. Mechanistic models attempt to identify the unobserved physical processes, or mechanisms, that drive and explain observed phenomena. The prototypical form of a mechanistic model is a system of ordinary or partial differential equations, whose dynamical laws are derived exactly or approximately from general physical principles. The advantage of this paradigm is that, at least when it works, mechanistic models can be expected to have greater generalizability than statistical models, since valid physical laws latch onto fundamental and persistent regularities in nature. A purely phenomenological model, by contrast, usually cannot be expected to generalize beyond the range of the data to which it is fit.

There seems to be fairly little interaction between the statistical and mechanistic cultures of scientific modeling.² This is surprising because many mechanistic models have free parameters that are unknown and must be estimated from noisy data, giving rise to all the usual problems of statistical inference. The book under review, Dynamical Biostatistical Models (Commenges and Jacqmin-Gadda 2015), is a rare example of a work that tries to bring together both viewpoints. One can imagine at least two different approaches to writing such a book: starting from statistics and showing how familiar statistical concepts relate and apply to mechanistic models, or starting from dynamical systems and showing how statistical inference may be incorporated. The present book takes the former path. The authors, Daniel Commenges and Hélène Jacqmin-Gadda, are biostatisticians and they present a good deal of conventional statistics before making contact with dynamical systems in the final chapters of the book.

The book has three main themes, to be explored in subsequent sections: statistical models for longitudinal data, using random effects to capture within-subject correlation; survival analysis, or models for time-to-event data; and extensions of these ideas to multistate, ODE, and SDE models, including aspects of causal inference. All of the models treated are “dynamical” in that they model phenomena which evolve over time. They are illustrated with data from biomedical studies, especially epidemiology, but are applicable to other domains. With the exception of one example, nearly the last in the book, the statistical methodology is thoroughly frequentist, favoring the likelihood school of inference.³

Models of longitudinal data

Longitudinal data, or panel data, consists of repeated measurements over time on a set of subjects. According to the authors, longitudinal data is distinguished from time series data by having a relatively large number of subjects or observational units but relatively few measurements per subject, which may be taken at irregular times (Commenges and Jacqmin-Gadda 2015, 63). To illustrate, the response to treatment over time of patients with a chronic disease is longitudinal data, whereas the daily number of new COVID-19 cases in the United States is time series data.

In this text, the workhorse models for longitudinal data are mixed effects models, where the fixed effects capture population-level trends and the random effects capture individual variation. The simplest version of this model is the linear mixed model

$Y_{i j} = ⟨ X_{i j}, β ⟩ + ⟨ Z_{i j}, b_{i} ⟩ + ε_{i j}, i = 1, \dots, N, j = 1, \dots, n_{i}$

where

$N$ is the number of subjects
$n_{i}$ is the number of measurements taken on subject $i$
$Y_{i j} \in R$ is the response at measurement $j$ on subject $i$
$X_{i j} \in R^{p}$ are predictors corresponding to global parameters $β \in R^{p}$
$Z_{i j} \in R^{q}$ are predictors corresponding to individual effects $b_{i} \sim N (0, B)$
$ε_{i j} \sim N (0, σ^{2})$ is the measurement error.

The parameters of this model are $β$ , $B$ , and $σ^{2}$ . All random variables $b_{i}$ and $ε_{i j}$ are assumed independent of each other. As a special case, we might consider a random intercept and slope model

$Y_{i j} = ⟨ X_{i j}, β ⟩ + b_{0 i} + b_{1 i} t_{i j} + ε_{i j},$

where $t_{i j}$ is the time at which the $j$ th measurement on subject $i$ was taken and $X_{i j}$ is a vector of covariates, usually including include the constant $1$ and the time $t_{i j}$ (Commenges and Jacqmin-Gadda 2015, chap. 4, Example 8).

Despite being a standard topic in statistics, mixed effects are a notorious source of confusion for beginners and professionals alike. Commenges and Jacqmin-Gadda do an admirable job of explaining why the model takes this form. First, in a standard linear model, all the variables $Y_{i j}$ would be independent, but we expect the measurements $Y_{i j}$ , $j = 1, \dots, n_{i}$ , for a single subject $i$ to be correlated. The mixed model does indeed yield a within-subject correlation. For instance, if the random intercept and slope model has diagonal covariance matrix $B = diag (σ_{0}^{2}, σ_{1}^{2})$ , then the within-subject covariance is

$Cov (Y_{i j}, Y_{i j^{'}}) = σ_{0}^{2} + σ_{1}^{2} t_{i j} t_{i j^{'}},$

which is nonzero and increasing with time. Moreover, making the individual effects $b_{i}$ random instead of fixed not only reduces the number of parameters that must be estimated, but accounts for the fact that the individuals in the study do not constitute the whole relevant population. In practice, the subjects always belong to some larger population of interest.

Chapters 4 and 5 present a wide array of extensions to the linear mixed model for longitudinal data, namely

multivariate linear mixed models, for correlated multivariate responses;
generalized linear mixed models (GLMMs),⁴ such as the mixed logistic model for binary responses and the mixed Poisson model for count data;
non-linear⁵ and curvilinear mixed models;
mixed models with latent classes, a type of mixture model; and
handling of missing data.

Concepts like generalized linear models (GLMs), mixture models, and missing data are mainstays of statistics, but it is still helpful to see how they manifest for mixed models of longitudinal data.

Among the less familiar ideas is the “curvilinear model” drawn from work by the authors and their collaborators.⁶ A curvilinear mixed model has the form

$h (Y_{i j}; η) = ⟨ X_{i j}, β ⟩ + ⟨ Z_{i j}, b_{i} ⟩ + ε_{i j},$

where $b_{i} \sim N (0, B)$ and $ε_{i j} \sim N (0, σ^{2})$ are independent, as before; $h$ is a fixed, real-valued function; and $η$ are further parameters to be estimated. Curvilinear models are useful when the response exhibits floor or ceiling effects or other departures from Gaussianity, such as in quality-of-life evaluations on a 50- or 100-point scale (Commenges and Jacqmin-Gadda 2015, sec. 5.1.6). For the transformation $h$ , the authors suggest using a basis of splines to obtain a flexible model without too many parameters.

Curvilinear mixed models should be compared with GLMMs, which have the form

$Y_{i j} \sim P (g^{- 1} (η_{i j}), ϕ), η_{i j} = ⟨ X_{i j}, β ⟩ + ⟨ Z_{i j}, b_{i} ⟩,$

where $g$ is the link function and $P (μ, ϕ)$ is an exponential dispersion family parameterized by mean $μ$ and dispersion $ϕ$ . Although there is some overlap between the two classes of models—for example, linear mixed models and probit mixed models belong to both—they are not the same. Most importantly, curvilinear models transform the response, which includes measurement error, while generalized linear models transform the response’s expected value. In addition, the transformation in a curvilinear model may have free parameters and the error distribution in a generalized linear model may be non-Gaussian.

Survival analysis

Another common type of temporal data is time-to-event data: the time, for each subject, at which an event of interest occurs. The analysis of time-to-event data is called survival analysis because the event is prototypically death or something equally morbid. But happier events are also allowed. For example, the event could be recovery from a nonlethal infectious disease. This interpretation is the source of an analogy between survival analysis and compartmental models in epidemiology, as we will see later. Although survival analysis is very important in certain application areas, such as medical trials and industrial reliability experiments, it is often not treated as a core topic in statistics curricula.

Modeling in survival analysis is centered around two crucial functions. Letting the time of death (or other event of interest) be a continuous random variable $T \geq 0$ with CDF $F$ , the survival function $S : R_{+} \to [0, 1]$ is

$S (t) := P (T > t) = 1 - F (t)$

and the hazard function $α : R_{+} \to R_{+}$ is

$α (t) := lim_{h \to 0} \frac{P (t < T \leq t + h | T > t)}{h} = \frac{F^{'} (t)}{S (t)} = - \frac{d}{d t} [\log S (t)] .$

Thus, $S (t)$ is the probability of surviving beyond time $t$ and $α (t) d t$ is the infinitesimal probability of dying in the interval $(t, t + d t]$ , conditional on surviving beyond time $t$ . The cumulative hazard function

$A (t) := \int_{0}^{t} α (u) d u = - \log S (t)$

is another useful quantity.

The hazard function plays the role in survival analysis that the probability density function plays in other parts of statistics. The hazard function is some ways more convenient for specifying models, since any function $α : R_{+} \to R_{+}$ with infinite mass ( $\int_{0}^{\infty} α (u) d u = \infty$ ) defines a valid hazard function and thereby the probability distribution of a nonnegative random variable, whereas a probability density function must be normalized to have total mass 1. The distribution with the simplest hazard function is the exponential distribution $Exp (λ)$ , which has constant hazard $α (t) \equiv λ$ . Two other common parametric families for event times, both generalizing the exponential, are the Weibull family, whose hazard functions are power laws, and the gamma family, whose hazard function has no closed form.

Chapters 3 and 6 present methods for survival analysis, beginning with the Kaplan-Meier estimator, a nonparametric estimator of the survival function, and the Cox proportional hazards model, a semiparametric model defined through the hazard function. They are among the most widely used statistical methods in medicine.⁷ We focus on the Cox model since it is more strongly connected to other themes of the book.

The Cox proportional hazards model is used to study how the hazard depends upon covariates of interest. It postulates that the hazard functions for $N$ independent subjects have the form⁸

$α_{i} (t) = α_{0} (t) e^{⟨ X_{i}, β ⟩}, i = 1, \dots, N,$

where both the baseline hazard function $α_{0}$ and the regression coefficients $β \in R^{p}$ are unknown. The model is semiparametric because the baseline hazard belongs to an infinite-dimensional parameter space while the regression parameters belong to a finite-dimensional one. In fitting the model, the baseline hazard is regarded as a nuisance parameter and only the regression coefficients are estimated, typically by maximizing a partial likelihood. The name of model derives from the fact that the hazard functions of any two subjects $i$ and $j$ are proportional, with constant of proportionality independent of time: $α_{i} (t) / α_{j} (t) = e^{⟨ X_{i} - X_{j}, β ⟩}$ . As the authors stress in their examples, this is a substantial assumption that should be checked.

Chapter 6 surveys extensions to survival analysis and the proportional hazards model, such as

competing risks models, having not a single event of interest but a set of mutually exclusive alternatives;
frailty models, accounting for grouped data, such as subjects belonging to the same family, or recurrent events, such as successive asthma attacks, through multiplicative random effects in the hazard function;
cure models, for when a nonnegligible portion of the population is not at risk, perhaps because of immunity to a disease.

In addition, Chapter 8 on “joint models” connects survival analysis with the previous theme, explaining how to accommodate both time-to-event data and longitudinal data within a single model.

Of these many topics we discuss only the competing risks model. As a typical scenario, elderly patients with cancer have nonnegligible risk of dying for unrelated reasons, so when estimating the risk due to cancer, the event “death from cancer” should be distinguished from “death from other causes” (Commenges and Jacqmin-Gadda 2015, sec. 6.2.5). Here there are two competing risks or causes. In general, a competing risks model has $K \geq 2$ causes with corresponding cause-specific hazard functions

$α_{k} (t) := lim_{h \to 0} \frac{P (t < T \leq t + h, D = k | T > t)}{h}, k = 1, \dots, K,$

where $T$ is the time of occurrence of any event and $D \in {1, \dots, K}$ is the type of event that occurred. What was previously called the “hazard function,” now the overall hazard function, is the sum of the cause-specific hazards: $α = \sum_{k = 1}^{K} α_{k}$ . To study whether the covariates are more strongly associated with some risks than others, parameters may be shared across event types. In one of two formulations discussed, the cause-specific hazard functions are

$α_{k, i} (t) = α_{k, 0} (t) \exp (⟨ β_{k}, X_{i} ⟩ + ⟨ γ, X_{i} ⟩), k = 1, \dots, K, i = 1, \dots, N,$

where the constraint $\sum_{k = 1}^{K} β_{k} = 0$ is imposed for identifiability.

Multistate and mechanistic models

Multistate models, the topic of Chapter 7, make a link between survival analysis and the sprawling field of dynamical stochastic processes. Survival analysis models are multistate models of a particularly simple kind, having two states, “alive” and “dead,” and a single transition, “death.”

Generalizing this, competing risks models with $K$ causes are multistate models with an initial state that can transition to one of $K$ absorbing states:

Another simple but important multistate model is the irreversible illness-death model, inserting a new state “ill” between “alive” and “dead” (Commenges and Jacqmin-Gadda 2015, sec. 7.2.6 & 7.5.6).

In this model, it is also possible to transition directly from “healthy” to “dead,” accounting for the possibility of death by causes other than the illness under study.

As mathematical entities, multistate models are continuous-time Markov chains, which may be characterized by their transition probabilities or transition intensities. The transition probabilities of a multistate process $M (t)$ , $t \geq 0$ , with state space ${0, 1, \dots, K}$ are the probabilities

$P_{h, j} (s, t) := P (M (t) = j | M (s) = h), 0 \leq s \leq t, 0 \leq h, j \leq K .$

These assemble into a stochastic matrix $P (s, t)$ whose $(h, j)$ -th entry is the probability of being in state $j$ at time $t$ , conditional on being in state $h$ at time $s$ . The transition probability matrices famously satisfy the Chapman-Kolmogorov equation

$P (s, t) = P (s, u) P (u, t), 0 \leq s \leq u \leq t .$

Given the transition probabilities, the transition intensities are defined by

$α_{h, j} (t) := \partial_{2} P_{h, j} (t, t) = lim_{h \to 0} \frac{P_{h, j} (t, t + h)}{h} .$

The quantity $α_{h, j} (t) d t$ , for $h \neq j$ , is the infinitesimal probability of transitioning to state $j$ during the interval $(t, t + d t]$ , conditional on being in state $h$ at time $t$ . The transition intensities assemble into an infinitesimal stochastic matrix $α (t)$ . As a consequence of the Chapman-Kolmogorov equation, the transition intensities also determine the transition probabilities via the forward Kolmogorov equation,

$\frac{\partial}{\partial t} P (s, t) = P (s, t) α (t), P (s, s) = I, 0 \leq s \leq t .$

For given $α$ and unknown $P$ , this is a system of ordinary differential equations in $t$ for every fixed $s$ .

For modeling purposes, a multistate model is most conveniently specified through its transition intensities. The transition intensity $α_{h, j}$ , for $h \neq j$ , can be interpreted as the $j$ -th cause-specific hazard function in state $h$ . The sum $\sum_{j \neq h} α_{h, j}$ is then the overall hazard function in state $h$ . The connection with survival analysis is now obvious. Many of the ideas and models of survival analysis carry over straightforwardly. For example, the proportional hazards model generalizes to the proportional intensities model

$α_{h, j}^{i} (t) = α_{h, j}^{0} (t) \exp (⟨ β_{h, j}, X_{h, j}^{i} ⟩), h \neq j, i = 1, \dots, N,$

where the covariates $X_{h, j}^{i}$ may depend on the transition. Note that this multistate model will not be homogeneous as a Markov chain unless the baseline hazards $α_{h, j}^{0}$ are all constant.

Multistate models are reminiscent of compartmental models in epidemiology. If, in the illness-death model, you replace “healthy” with $S$ (“susceptible”), “ill” with $I$ (“infected”), and “dead” with $R$ (“recovered/removed”), the basic structure is similar to an SIR model.⁹ Yet there are fundamental differences between the two classes of models. The most basic conceptual distinction is that multistate statistical models are models of individuals whereas compartmental models are models of populations—indeed, populations so large that it makes sense to pass to the continuous limit. Thus, a typical application of multistate models would be to personalized medicine, while compartmental models are applicable to epidemics and cell populations in organisms.

However, statistics is relevant to mechanistic models such as the SIR model when, as often happens, they are only partially and noisily observed. Chapter 9, the last in the book, contains a brief introduction to statistical inference for mechanistic models based on ordinary or stochastic differential equations. An ODE model starts from a model of the system

$\frac{d x}{d t} = F (x (t); ξ (t)),$

where $ξ (t)$ is a vector of unknown parameters that may depend on time. This gives rise to a model for the observations¹⁰ at discrete times, namely

$Y_{j} = g_{j} (x (t_{j})) + ε_{j}, j = 1, \dots, n,$

where the $g_{j}$ ’s are known functions and the $ε_{j}$ ’s are measurement errors, often assumed to be Gaussian. When there are $N$ systems having similar dynamics but possibly different parameters, such as in a study of the cell populations of multiple subjects, the model equations generalize to

$\begin{aligned} \frac{d x^{i}}{d t} & = F (x^{i} (t); ξ^{i} (t)), i = 1, \dots, N \\ Y_{i j} & = g_{j} (x^{i} (t_{i j})) + ε_{i j}, i = 1, \dots, N, j = 1, \dots, n_{i} . \end{aligned}$

Just as for other kinds of longitudinal data, the within-subject correlation is captured by a mixed model, such as

$ξ^{i} (t) = ⟨ X_{i} (t), β ⟩ + ⟨ Z_{i} (t), b_{i} ⟩,$

where $β \in R^{p}$ are fixed parameters and $b_{i} \sim N (0, B)$ are random effects. We have now made a full circle, returning to the first theme of the book.

Another topic of Chapter 9 is stated in its title: “The dynamic approach to causality.” I will not discuss this here, intriguing though it is, because the presentation is somewhat sketchy and I am not competent to compare the dynamic approach with other approaches to causality, such as potential outcomes. The material on causality is drawn from research by the first author, Commenges, in collaboration with Gégout-Petit (Commenges and Gégout-Petit 2009).

Summary

As suggested by the breadth of this review, Dynamical Biostatistical Models covers an extraordinary amount of material for a book that is, excluding appendices, less than three hundred pages long. Inevitably, this means that the book is more a survey than an encyclopedia, but it serves that purpose well. The prose, like the exposition, is succinct and unadorned. The text is translated from the original French and at times this shows, most obviously in the several places where the authors forget to translate “et” as “and.” However, the occasional infelicity never inhibits the readability of the text. I would recommend the book to those with basic training in statistics who wish learn about dynamical statistical methods, particularly as used in medicine and epidemiology.

References

Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures.” Statistical Science 16 (3): 199–231. DOI:10.1214/ss/1009213726.

Commenges, Daniel, and Anne Gégout-Petit. 2009. “A General Dynamical Statistical Model with Causal Interpretation.” Journal of the Royal Statistical Society Series B: Statistical Methodology 71 (3): 719–36. DOI:10.1111/j.1467-9868.2009.00703.x.

Commenges, Daniel, and Hélène Jacqmin-Gadda. 2015. Dynamical Biostatistical Models. CRC Press. DOI:10.1201/b19109.

Lindstrom, Mary J., and Douglas M. Bates. 1990. “Nonlinear Mixed Effects Models for Repeated Measures Data.” Biometrics 46 (3): 673–87. DOI:10.2307/2532087.

Pawitan, Yudi. 2001. In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford University Press.

Proust, Cécile, Hélène Jacqmin-Gadda, Jeremy M. G. Taylor, Julien Ganiayre, and Daniel Commenges. 2006. “A Nonlinear Model with Latent Process for Cognitive Evolution Using Multivariate Longitudinal Data.” Biometrics 62 (4): 1014–24. DOI:10.1111/j.1541-0420.2006.00573.x.

Proust-Lima, Cécile, Luc Letenneur, and Hélène Jacqmin-Gadda. 2007. “A Nonlinear Latent Class Model for Joint Analysis of Multivariate Longitudinal Data and a Binary Outcome.” Statistics in Medicine 26 (10): 2229–45. DOI:10.1002/sim.2659.

Reluga, Timothy C., and Jan Medlock. 2007. “Resistance Mechanisms Matter in SIR Models.” Mathematical Biosciences and Engineering 4 (3): 553–63. DOI:10.3934/mbe.2007.4.553.

Stroup, Walter W. 2012. Generalized Linear Mixed Models: Modern Concepts, Methods and Applications. CRC Press. DOI:10.1201/b13151.

Sundberg, Rolf. 2019. Statistical Modelling by Exponential Families. Cambridge University Press. DOI:10.1017/9781108604574.

Footnotes

While rudimentary statistical analysis was already used by the astronomers of the late 18^th century, the statistical paradigm in its modern form is barely a hundred years old.↩︎
In a famous essay (Breiman 2001), Leo Breiman draws a different cultural distinction, between the statistical mainstream (the “data modeling culture”) and machine learning (the “algorithmic modeling culture”). From the viewpoint of this review, the culture of statistical modeling is the intermediate position, taking a view of models that is more realistic than the instrumentalism of machine learning but less realistic than in the mechanistic tradition.↩︎
According to the likelihood school, statistical inference is performed by maximizing the likelihood or one of its innumerable variants: marginal likelihood, conditional likelihood, partial likelihood, profile likelihood, penalized likelihood, and so on. Chapter 2 of Dynamical Biostatistical Models briskly reviews these ideas but for a fuller account the reader should consult another text, such as (Pawitan 2001).↩︎
See, for example, (Stroup 2012).↩︎
See (Lindstrom and Bates 1990).↩︎
Be warned that the term “curvilinear” has no definite technical meaning in statistics and that the usage here is idiosyncratic. Versions of the curvilinear mixed model are featured in research papers by the authors (Proust et al. 2006; Proust-Lima, Letenneur, and Jacqmin-Gadda 2007).↩︎
According to Google Scholar, the original papers on the Kaplan-Meier estimator and the Cox proportional hazards model have each been cited over 50,000 times.↩︎
As the form of the hazard function suggests, the proportional hazards model is related to exponential families and GLMs (Sundberg 2019, sec. 9.7.1).↩︎
The standard SIR model has no direct transition $S \to R$ , but it may be added to model control measures or vaccination, such as in (Reluga and Medlock 2007).↩︎
The distinction between models for the system and for the observations is emphasized by the authors, who remark that it is commonplace in stochastic processes and control theory but not in biostatistics (Commenges and Jacqmin-Gadda 2015, sec. 9.1.3.1). It also figures prominently in the work of Patrick Suppes, as noted in my review of Representation and Invariance.↩︎