Fitting odin2 models to data

With a dynamical model we can simulate forwards in time and see how a system might change over time, given a set of parameters. If we have a time series data set though, we can go a step further and find parameters consistent with the data. This vignette gives an introduction to the approaches to fitting to data available for odin2 models. This support largely derives from the monty and dust2 packages and we will refer the reader to their documentation where further detail is on offer.

library(odin2)
library(dust2)
library(monty)

Setting the scene

We’ll start with a simple data set of daily cases of some disease over time

data <- read.csv("incidence.csv")
head(data)
#>   cases time
#> 1     3    1
#> 2     2    2
#> 3     2    3
#> 4     2    4
#> 5     1    5
#> 6     3    6
plot(cases ~ time, data, pch = 19, las = 1,
     xlab = "Time (days)", ylab = "Cases")

The data here shows a classic epidemic, with cases rising up to some peak and falling. We will try fitting this with a simple compartmental SIR (Susceptible-Infected-Recovered) model, which we will write here in odin2. There are a number of possible ways of writing this, but here we’ll go for a stochastic discrete-time version, mostly because it will allow us to demonstrate a number of features of odin2, dust2 and monty (and because the ODE version is not yet written).

Before fitting the data, we’ll write out a model that captures the core ideas (this is replicated from vignette("odin2")), but with an equation for incidence added (the number of new infections over one time unit).

sir <- odin({
  initial(S) <- N - I0
  initial(I) <- I0
  initial(R) <- 0
  initial(incidence, zero_every = 1) <- 0
  update(S) <- S - n_SI
  update(I) <- I + n_SI - n_IR
  update(R) <- R + n_IR
  update(incidence) <- incidence + n_SI
  n_SI <- Binomial(S, p_SI)
  n_IR <- Binomial(I, p_IR)
  p_SI <- 1 - exp(-beta * I / N * dt)
  p_IR <- 1 - exp(-gamma * dt)
  beta <- parameter()
  gamma <- parameter()
  I0 <- parameter()
  N <- 1000
}, quiet = TRUE)

We can initialise this system and simulate it out over this time series and plot the results against the data:

pars <- list(beta = 0.3, gamma = 0.1, I0 = 5)
sys <- dust_system_create(sir(), pars, n_particles = 20, dt = 0.25)
dust_system_set_state_initial(sys)
time <- 0:100
y <- dust_system_simulate(sys, time)

The dust_system_simulate() function returns an n_state by n_particle by n_time matrix (here, 4 x 20 x 101). We’re interested in incidence, and extracting that gives us a 20 x 101 matrix, which we’ll transpose in order to plot it:

incidence <- dust_unpack_state(sys, y)$incidence
matplot(time, t(incidence), type = "l", lty = 1, col = "#00000055",
        xlab = "Time (days)", ylab = "Cases", las = 1)
points(cases ~ time, data, pch = 19, col = "red")

The modelled trajectories are in grey, with the data points overlaid in red – we’re not doing a great job here of capturing the data.

Comparing to data

We’re interested in fitting this model to data, and the first thing we need is a measure of goodness of fit, which we can also code into the odin model, but first we’ll explain the idea.

Our system moves forward in time until it finds a new data point; at this point in time we will have one or several particles present. We then ask for each particle how likely this data point is. This means that these calculations are per-particle and per-data-point.

Here, we’ll use a Poisson to ask “what is the probability of observing this many cases with a mean equal to our modelled number of daily cases”.

The syntax for this looks a bit different to the odin code above:

sir <- odin({
  initial(S) <- N - I0
  initial(I) <- I0
  initial(R) <- 0
  initial(incidence, zero_every = 1) <- 0
  update(S) <- S - n_SI
  update(I) <- I + n_SI - n_IR
  update(R) <- R + n_IR
  update(incidence) <- incidence + n_SI
  n_SI <- Binomial(S, p_SI)
  n_IR <- Binomial(I, p_IR)
  p_SI <- 1 - exp(-beta * I / N * dt)
  p_IR <- 1 - exp(-gamma * dt)
  beta <- parameter()
  gamma <- parameter()
  I0 <- parameter()
  N <- 1000

  # Comparison to data
  cases <- data()
  cases ~ Poisson(incidence)
}, quiet = TRUE)

These last two lines are the new addition to the odin code. The first says that cases will be found in the data. The second restates our aim from the previous paragraph, comparing the observed cases against modelled incidence. The syntax here is designed to echo that of the monty DSL.

With this version of the model we can compute likelihoods with dust2’s machinery.

Stochastic likelihood with a particle filter

Our system is stochastic; each particle will produce a different trajectory and from that a different likelihood. Each time we run the system we get a different combination of likelihoods. We can use a particle filter to generate an estimate of the marginal likelihood, averaging over this stochasticity. This works by resampling the particles at each point along the time series, according to how likely they are.

filter <- dust_filter_create(sir(), 0, data, n_particles = 200)

Each time we run this filter the likelihood will be slightly (or very) different:

dust_likelihood_run(filter, pars)
#> [1] -281.9321
dust_likelihood_run(filter, pars)
#> [1] -278.4973

If you run the filter enough times a distribution will emerge of course. Let’s compare two points in parameter space, varying the beta parameter and running the filter 100 times each:

pars1 <- modifyList(pars, list(beta = 0.25))
pars2 <- modifyList(pars, list(beta = 0.23))
ll1 <- replicate(100, dust_likelihood_run(filter, pars1))
ll2 <- replicate(100, dust_likelihood_run(filter, pars2))

xb <- seq(floor(min(ll1, ll2)), ceiling(max(ll1, ll2)), by = 1)
hist(ll2, breaks = xb, col = "#0000ff99", freq = FALSE,
     xlab = "Log likelihood", ylab = "Density", main = "")
hist(ll1, breaks = xb, add = TRUE, freq = FALSE, col = "#ff000099")
abline(v = c(mean(ll1), mean(ll2)), col = c("red", "blue"), lwd = 2)

So even a relatively small difference in a parameter leads to a difference in the log-likelihood that is easily detectable in only 100 runs of the filter, even when the distributions overlap. However, it does make optimisation-based approaches to inference, such as maximum likelihood, tricky because it’s hard to know which way “up” is if each time you try a point it might return a different height.

If you run a particle filter with the argument save_trajectories = TRUE then we save the trajectories of particles over time:

dust_likelihood_run(filter, list(beta = 0.2, gamma = 0.1),
                    save_trajectories = TRUE)
#> [1] -246.25

You can access these with dust_likelihood_last_trajectories():

h <- dust_likelihood_last_trajectories(filter)

The result here is a 4 x 100 x 200 array:

dim(h)
#> [1]   4 200 100

The dimensions represent, in turn:

  1. 4 state variables
  2. 200 particles
  3. 100 time steps (corresponding to the data)

Considering just incidence, and plotting over time, you may be able to make out the tree structure of the trajectories, with fewer distinct traces at the start of the time series, and some traces more heavily represented in the final sample than others:

incidence <- dust_unpack_state(sys, h)$incidence
matplot(t(incidence), type = "l", lty = 1, col = "#00000022")
points(cases ~ time, data, pch = 19, col = "red")

Inference with particle MCMC (pMCMC)

We can use MCMC to explore this model, but to do this we will need a prior. We’ll use monty’s DSL to create one; this looks similar to the odin code above:

prior <- monty_dsl({
  beta ~ Exponential(mean = 0.3)
  gamma ~ Exponential(mean = 0.1)
})

Here we define a prior that covers beta and gamma, two of the three input parameters to our odin model. This prior is a monty_model object, which we can use to sample from, compute log densities with (to compute the prior), etc.

We also need to adapt our dust2 filter object above for use with monty. All we need to do here is to describe how a vector of statistical parameters (here beta and gamma) will be converted into the inputs that the sir system needs to run (here a list with elements beta, gamma and I0). We do this with a monty_packer object:

sir_packer <- monty_packer(c("beta", "gamma"), fixed = list(I0 = 5))

With this packer we can convert from a list of name-value pairs suitable for initialising a dust2 system into a vector of parameters suitable for use with monty:

sir_packer$pack(pars)
#> [1] 0.3 0.1

and we can carry out the inverse:

sir_packer$unpack(c(0.3, 0.1))
#> $beta
#> [1] 0.3
#> 
#> $gamma
#> [1] 0.1
#> 
#> $I0
#> [1] 5

Combining the filter and packer we create a monty model, which we’ll call likelihood, as that’s what it represents:

likelihood <- dust_likelihood_monty(filter, sir_packer)

This likelihood is now also a monty_model object:

likelihood
#> ── <monty_model> ───────────────────────────────────────────────────────────────
#> ℹ Model has 2 parameters: 'beta' and 'gamma'
#> ℹ This model:
#> • is stochastic
#> ℹ See `?monty_model()` for more information

The monty package provides a high-level interface for working with these objects. For example, to compute the likelihood we could now use monty_model_density():

monty_model_density(likelihood, c(0.2, 0.1))
#> [1] -245.6315

The difference to using dust_likelihood_run here is now we provide a parameter vector from our statistical model, rather than the inputs to the odin/dust model. This conforms to the interface that monty uses and lets us run things like MCMC.

We can combine the prior and the likelihood to create a posterior:

posterior <- prior + likelihood

The last ingredient required for running an MCMC is a sampler. We don’t have much choice with a model where the likelihood is stochastic, we’ll need to run a simple random walk. However, for this we still need a proposal matrix (the variance covariance matrix that is the parameter for a multivariate Gaussian - we’ll draw new positions from this). In an ideal world, this distribution will have a similar shape to the target distribution (the posterior) as this will help with mixing. To get started, we’ll use an uncorrelated random walk with each parameter having a fairly wide variance of 0.02

sampler <- monty_sampler_random_walk(diag(2) * 0.02)

We can now run an MCMC for 100 samples

samples <- monty_sample(posterior, sampler, 100,
                        initial = sir_packer$pack(pars))
#> 🎄 Sampling  ■                                |   1% ETA:  3s
#> 🌲 Sampling  ■■■■■■■■■■■■■■■■■■■■■            |  65% ETA:  0s
#> ✔ Sampled 100 steps across 1 chain in 1.4s
#> 

We need to develop nice tools for working with the outputs of the sampler, but for now bear with some grubby base R manipulation.

The likelihood here is very “sticky”

plot(samples$density, type = "l")

It’s hard to say a great deal about the parameters beta (per-contact transmission rate) and gamma (recovery rate) from this few samples, especially as we have very few effective samples:

plot(t(drop(samples$pars)), pch = 19, col = "#00000055")

Effective sampling

There are several things we can do here to improve how this chain mixes

  • We can try and find a better proposal kernel.
  • We can increase the number of particles used in the filter. This will reduce the variance in the estimate of the marginal likelihood, which means that the random walk will be less confused by fluctuations in the surface it’s moving over. This comes at a computational cost though.
  • We can increase the number of threads (effectively CPU cores) that we are using while computing the likelihood. This will scale fairly efficiently through to at least 10 cores, with the likelihood calculations being almost embarrassingly parallel. This will help to offset some of the costs incurred above.
  • We can run multiple chains at once. We don’t yet have a good parallel runner implemented in monty but it is coming soon. This will reduce wall time (because each chain runs at the same time) and also allows us to compute convergence diagnostics which will reveal how well (or badly) we are doing.
  • We can try a deterministic model (see below) to get a sense of the general region of high probability space.

Here, we apply most of these suggestions at once, using a variance-covariance matrix that I prepared earlier:

filter <- dust_unfilter_create(sir(), 0, data, n_particles = 1000)
likelihood <- dust_likelihood_monty(filter, sir_packer)
vcv <- matrix(c(0.0005, 0.0003, 0.0003, 0.0003), 2, 2)
sampler <- monty_sampler_random_walk(vcv)
samples <- monty_sample(posterior, sampler, 1000,
                               initial = sir_packer$pack(pars))
#> 🎄 Sampling  ■■■■■■                           |  16% ETA: 12s
#> 🌲 Sampling  ■■■■■■■■■■■■                     |  37% ETA:  9s
#> 🎄 Sampling  ■■■■■■■■■■■■■■■■■■               |  57% ETA:  6s
#> 🌲 Sampling  ■■■■■■■■■■■■■■■■■■■■■■■■■        |  78% ETA:  3s
#> 🎄 Sampling  ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■  |  99% ETA:  0s
#> ✔ Sampled 1000 steps across 1 chain in 14.4s
#> 

The likelihood now quickly rises up to a stable range and is clearly mixing:

plot(samples$density, type = "l")

The parameters beta (per-contact transmission rate) and gamma (recovery rate) are strongly correlated

plot(t(drop(samples$pars)), pch = 19, col = "#00000055")