mcstate
sits at the
top of a stack of packages, which together allow inference of a
stochastic model’s parameters by fitting to a time-series of
observational data. To use mcstate
, we require the
following pieces:
dust
. This will
generate (stochastic) updates of the state at each time step.Below we describe a typical use case for infectious disease: given daily case counts during an epidemic, how can we estimate key features of the transmission process such as the transmission rate, recovery rate and from these the basic reproductive number R0.
A simple definition of the SIR model, as given in the odin documentation is: $$\begin{align*} \frac{dS}{dt} &= -\beta \frac{SI}{N} \\ \frac{dI}{dt} &= \beta \frac{SI}{N} - \gamma I \\ \frac{dR}{dt} &= \gamma I \\ \end{align*}$$ S is the number of susceptibles, I is the number of infected and R is the number recovered; the total population size N = S + I + R is constant. β is the infection rate, γ is the recovery rate.
Discretising this model in time steps of width dt gives the following update equations for each time step:
$$\begin{align*} S_{t+1} &= S_t - n_{SI} \\ I_{t+1} &= I_t + n_{SI} - n_{IR} \\ R_{t+1} &= R_t + n_{IR} \end{align*}$$
where $$\begin{align*} n_{SI} &\sim B(S, 1 - e^{-\beta \frac{I}{N} \cdot dt}) \\ n_{IR} &\sim B(I, 1 - e^{-\gamma \cdot dt}) \end{align*}$$
This example is included with the dust
package, and can
be loaded with:
To learn how to implement this yourself, see the vignette over at odin.dust
.
Typically these models will be used in conjunction with observed data
to infer the highest posterior density values for model parameters, most
likely trajectories up to the present (nowcast), and make forecasts
using these parameters. The mcstate
package implements
particle filter and pMCMC methods
to make all of this this possible from dust
models.
A comparison function is needed to define a likelihood, i.e. the probability of observing this trajectory, given the data. This is combined with a prior to generate a posterior distribution for each parameter, which in the SIR model gives the distribution of likely values for β and γ.
This is straightforward for a deterministic model, but for a
stochastic model over a long time-series, trajectories can quickly
become inconsistent with the data, and the massive space of potential
outcomes makes downstream sampling highly inefficient.
mcstate
therefore implements a particle filter, also known
as sequential Monte Carlo (SMC), which runs multiple trajectories, and
at each stage where data is available resamples them in proportion to
their likelihood, keeping only those trajectories close to the data, and
with likelihoods that are plausibly important parts of the posterior.
Importantly, this technique produces an unbiased estimate of the
likelihood for a given set of parameters.
The anatomy of an mcstate particle filter, as noted above, consists of three main components:
mcstate::particle_filter_data()
.dust::dust()
or odin.dust::odin_dust()
.To demonstrate this, we will attempt to infer the parameters β and γ using daily case counts. In
reality these counts would be measured, but here we have generated them
from a single run of the model with seed = 2L
, and added
some constant noise proportional to the epidemic size:
This can then be processed with
mcstate::particle_filter_data()
as follows, taking four
update steps between every observation so that rate = 4
.
Generally increasing the rate will improve resolution, especially if
rates of change between partitions are high, though at the expense of
extra computation time. We also define a time interval between each
update $dt =
\frac{1}{\mathrm(rate)}$:
dt <- 0.25
sir_data <- mcstate::particle_filter_data(data = incidence,
time = "day",
rate = 1 / dt)
#> Warning in mcstate::particle_filter_data(data = incidence, time = "day", :
#> 'initial_time' should be provided. I'm assuming '0' which is one time unit
#> before the first time in your data (1), but this might not be appropriate. This
#> will become an error in a future version of mcstate
rmarkdown::paged_table(sir_data)
More complex models may have additional data streams added as further columns (e.g. deaths, hospital admissions). This will be fed as observed data to the compare function, which will combine and compute the evidence for the current model state.
The comparison function will typically be a likelihood, the
probability of the simulation run given the data. This can be any
function in R which takes the current simulated state
, a
data
row for the current time point and a list of any
parameters pars
the function needs.
For this model the daily number of cases is calculated within update function and returned as the 5th state element. In this comparison function, the number of cases within each time interval is modelled as being Poisson distributed. A small amount of noise is added to prevent zero expectations.
case_compare <- function(state, observed, pars = NULL) {
exp_noise <- 1e6
incidence_modelled <- state[5, , drop = TRUE]
incidence_observed <- observed$cases
lambda <- incidence_modelled +
rexp(n = length(incidence_modelled), rate = exp_noise)
dpois(x = incidence_observed, lambda = lambda, log = TRUE)
}
We can use the info()
method on a dust model to inspect
what order the state variables will come in:
gen_sir$new(pars = list(), time = 0, n_particles = 1L)$info()
#> $vars
#> [1] "S" "I" "R" "cases_cumul" "cases_inc"
#>
#> $pars
#> $pars$beta
#> [1] 0.2
#>
#> $pars$gamma
#> [1] 0.1
The SIR example actually outputs incidence directly as part of the state, so the comparison function can be written without needing to recalculate this:
incidence_compare <- function(state, prev_state, observed, pars = NULL) {
exp_noise <- 1e6
lambda <- state[4, , drop = TRUE] +
rexp(n = length(incidence_modelled), rate = exp_noise)
dpois(x = observed$cases, lambda = lambda, log = TRUE)
}
Just some of the state can be extracted for the comparison function, which is demonstrated below.
Using these pieces, we can set up a particle filter as follows:
n_particles <- 100
filter <- mcstate::particle_filter$new(data = sir_data,
model = gen_sir,
n_particles = n_particles,
compare = case_compare,
seed = 1L)
We can now run the particle filter forward, which will run and
resample 100 trajectories, and return the final likelihood. It is
important to set the correct time-step dt
here, as we are
using 0.25, rather than the default value of 1 defined in the odin
model. We will also save the history, which allows us to plot the
particle trajectories, as well as use the likelihood:
If we plot these along with the data, we can see compared to above
that only trajectories consistent with the data are kept, and the
variance between particles has been reduced compared to the simulations
shown in the odin.dust
vignette:
plot_particle_filter <- function(history, true_history, times, obs_end = NULL) {
if (is.null(obs_end)) {
obs_end <- max(times)
}
par(mar = c(4.1, 5.1, 0.5, 0.5), las = 1)
cols <- c(S = "#8c8cd9", I = "#cc0044", R = "#999966")
matplot(times, t(history[1, , -1]), type = "l",
xlab = "Time", ylab = "Number of individuals",
col = cols[["S"]], lty = 1, ylim = range(history))
matlines(times, t(history[2, , -1]), col = cols[["I"]], lty = 1)
matlines(times, t(history[3, , -1]), col = cols[["R"]], lty = 1)
matpoints(times[1:obs_end], t(true_history[1:3, , -1]), pch = 19,
col = cols)
legend("left", lwd = 1, col = cols, legend = names(cols), bty = "n")
}
true_history <- readRDS("sir_true_history.rds")
plot_particle_filter(filter$history(), true_history, incidence$day)
Doing the same with a β and γ which are further from the truth yields a lower likelihood:
filter$run(save_history = TRUE, pars = list(dt = dt, beta = 0.4, gamma = 0.2))
#> [1] -260.0149
plot_particle_filter(filter$history(), true_history, incidence$day)
You can use the run()
function of the particle filter
directly, but for parameter inference, or forecasting after parameter
inference, you should usually use the functions below to infer
parameters.
Using this, we can do a MCMC using the Metropolis-Hastings algorithm to infer the values of β and γ from daily case counts. Let’s first describe the parameters we wish to infer, giving a minimum, maximum. For γ we will also apply a gamma prior with shape 1 and rate 0.2: Γ(1, 0.2):
beta <- mcstate::pmcmc_parameter("beta", 0.2, min = 0)
gamma <- mcstate::pmcmc_parameter("gamma", 0.1, min = 0, prior = function(p)
dgamma(p, shape = 1, scale = 0.2, log = TRUE))
proposal_matrix <- diag(0.1, 2)
mcmc_pars <- mcstate::pmcmc_parameters$new(list(beta = beta, gamma = gamma),
proposal_matrix)
The proposals for each MCMC step are also set here. Proposals are drawn from a multi-variate normal distribution, with the variance-covariance matrix set here. In this case we have simply set βn + 1 ∼ N(βn, 0.01) and γn + 1 ∼ N(γn, 0.01) so there is no covariance. We show how to improve this below.
Note that in this very simple case, the parameters that our model
takes as input (alpha
and beta
) happen to be
exactly the same as the ones that we are interested in inferring. For
nontrivial models this is very unlikely to the case, see the section
“Parameter transformations” in ?mcstate::pmcmc_parameter
for details on how to cope with this.
The sampler can now be run using these parameters, and the particle filter defined above.
Taking 50 steps on a single chain:
control <- mcstate::pmcmc_control(
n_steps,
save_state = TRUE,
save_trajectories = TRUE,
progress = TRUE)
pmcmc_run <- mcstate::pmcmc(mcmc_pars, filter, control = control)
#> Running chain 1 / 1
#> Finished 50 steps in 0.8 secs
plot_particle_filter(pmcmc_run$trajectories$state, true_history, incidence$day)
We also provide some basic tools for dealing with this output, before running prediction (see below). We can remove burn-in and thin the four chains as follows:
processed_chains <- mcstate::pmcmc_thin(pmcmc_run, burnin = n_burnin, thin = 2)
parameter_mean_hpd <- apply(processed_chains$pars, 2, mean)
parameter_mean_hpd
#> beta gamma
#> 0.2 0.1
The resulting objects can also be analysed in a typical MCMC analysis
package such as coda
.
We will need to combine the two parts of the object which contain the
probabilities and parameter values across the iterations:
mcmc1 <- coda::as.mcmc(cbind(pmcmc_run$probabilities, pmcmc_run$pars))
summary(mcmc1)
#>
#> Iterations = 1:50
#> Thinning interval = 1
#> Number of chains = 1
#> Sample size per chain = 50
#>
#> 1. Empirical mean and standard deviation for each variable,
#> plus standard error of the mean:
#>
#> Mean SD Naive SE Time-series SE
#> log_prior 1.109 0 0 0
#> log_likelihood -246.861 0 0 0
#> log_posterior -245.752 0 0 0
#> beta 0.200 0 0 0
#> gamma 0.100 0 0 0
#>
#> 2. Quantiles for each variable:
#>
#> 2.5% 25% 50% 75% 97.5%
#> log_prior 1.109 1.109 1.109 1.109 1.109
#> log_likelihood -246.861 -246.861 -246.861 -246.861 -246.861
#> log_posterior -245.752 -245.752 -245.752 -245.752 -245.752
#> beta 0.200 0.200 0.200 0.200 0.200
#> gamma 0.100 0.100 0.100 0.100 0.100
plot(mcmc1)
The MCMC above is not particularly efficient in terms of effective sample size per iteration:
coda::effectiveSize(mcmc1)
#> log_prior log_likelihood log_posterior beta gamma
#> 0 0 0 0 0
1 - coda::rejectionRate(mcmc1)
#> log_prior log_likelihood log_posterior beta gamma
#> 0 0 0 0 0
Clearly the acceptance rate is too low. Generally a target of 0.5, reducing to 0.234 as the number of parameters increases, is thought to be optimal. The proposal distribution needs to be modified, in this case reducing the variance of the jump sizes. One option to automate this process is to run a chain for a bit, as above, then use the covariance of the state as the proposal matrix:
proposal_matrix <- cov(pmcmc_run$pars)
mcmc_pars <- mcstate::pmcmc_parameters$new(
list(beta = beta, gamma = gamma),
proposal_matrix)
#> Error in eigen(vcv, symmetric = TRUE): 0 x 0 matrix
proposal_matrix
#> beta gamma
#> beta 0 0
#> gamma 0 0
Let’s now run four independent chains with these proposals:
control <- mcstate::pmcmc_control(
n_steps,
save_state = TRUE,
save_trajectories = TRUE,
progress = TRUE,
n_chains = 4)
pmcmc_tuned_run <- mcstate::pmcmc(mcmc_pars, filter, control = control)
#> Running chain 1 / 4
#> Finished 50 steps in 0.8 secs
#> Running chain 2 / 4
#> Finished 50 steps in 0.9 secs
#> Running chain 3 / 4
#> Finished 50 steps in 0.8 secs
#> Running chain 4 / 4
#> Finished 50 steps in 0.8 secs
mcmc2 <- coda::as.mcmc(cbind(
pmcmc_tuned_run$probabilities, pmcmc_tuned_run$pars))
summary(mcmc2)
#>
#> Iterations = 1:200
#> Thinning interval = 1
#> Number of chains = 1
#> Sample size per chain = 200
#>
#> 1. Empirical mean and standard deviation for each variable,
#> plus standard error of the mean:
#>
#> Mean SD Naive SE Time-series SE
#> log_prior 1.12890 0.036207 0.002560 0.020755
#> log_likelihood -245.79688 0.856365 0.060554 0.483496
#> log_posterior -244.66799 0.877579 0.062054 0.503925
#> beta 0.19711 0.005374 0.000380 0.003081
#> gamma 0.09611 0.007241 0.000512 0.004151
#>
#> 2. Quantiles for each variable:
#>
#> 2.5% 25% 50% 75% 97.5%
#> log_prior 1.1094 1.109 1.109 1.109 1.196
#> log_likelihood -247.1913 -246.065 -245.551 -245.293 -244.890
#> log_posterior -246.0819 -244.956 -244.441 -244.183 -243.694
#> beta 0.1872 0.200 0.200 0.200 0.200
#> gamma 0.0827 0.100 0.100 0.100 0.100
plot(mcmc2)
Looking again at the effective sample size and rejection rate, these chains were a lot more efficient, and have a much better effective sample size thanks for the four chains:
Going back to the simple SIR model above, we can easily continue
running the fitted model forward in time using the posterior estimated
by pmcmc()
to create a forecast past the end of the data,
sampling 10 particles:
## TODO: the use of times here is confusing, which is it?
forecast <- predict(processed_chains,
times = seq(400, 800, 4),
prepend_trajectories = TRUE,
seed = processed_chains$predict$seed)
keep <- sample.int(ncol(forecast$state), size = 10)
mini_forecast <- forecast$state[, keep, ]
plot_particle_filter(mini_forecast,
true_history,
seq_len(200),
obs_end = max(incidence$day))
Up to t = 100 is the nowcast, from t = 100 to t = 200 is the forecast.
The set up of the particle filter and observation function makes it easy to fit to multiple datastreams within a Bayesian framework. Let us model a disease where a proportion of infected cases pdeath = 0.05 die rather than recover by adding the following lines to the odin code:
## Definition of the time-step and output as "time"
steps_per_day <- user(1)
dt <- 1 / steps_per_day
initial(time) <- 0
update(time) <- (step + 1) * dt
## Core equations for transitions between compartments:
update(S) <- S - n_SI
update(I) <- I + n_SI - n_IR - n_ID
update(R) <- R + n_IR
update(D) <- D + n_ID
## Individual probabilities of transition:
p_SI <- 1 - exp(-beta * I / N) # S to I
p_IR <- 1 - exp(-gamma) # I to R
## Draws from binomial distributions for numbers changing between
## compartments, dealing competing hazards for I -> R and I -> D
n_Ix <- rbinom(I, p_IR * dt)
n_IR <- rbinom(n_Ix, 1 - p_death)
n_ID <- n_Ix - n_IR
n_SI <- rbinom(S, p_SI * dt)
## Total population size
N <- S + I + R
## Initial states:
initial(S) <- S_ini
initial(I) <- I_ini
initial(R) <- 0
initial(D) <- 0
## Convert some cumulative values into incidence:
initial(S_inc) <- 0
initial(D_inc) <- 0
update(S_inc) <- if (step %% steps_per_day == 0) n_SI else S_inc + n_SI
update(D_inc) <- if (step %% steps_per_day == 0) n_ID else D_inc + n_ID
## User defined parameters - default in parentheses:
S_ini <- user(1000)
I_ini <- user(10)
beta <- user(0.2)
gamma <- user(0.1)
p_death <- user(0.05)
Creating a model from this code:
We can simulate some data to fit to, as before. We have also made the case data patchy, so it is missing on some days:
dt <- 0.25
death_data <- read.table("death_data.csv", header = TRUE, sep = ",")
true_history_deaths <- readRDS("deaths_true_history.rds")
combined_data <- mcstate::particle_filter_data(
death_data,
time = "day",
rate = 1 / dt
)
#> Warning in mcstate::particle_filter_data(death_data, time = "day", rate =
#> 1/dt): 'initial_time' should be provided. I'm assuming '0' which is one time
#> unit before the first time in your data (1), but this might not be appropriate.
#> This will become an error in a future version of mcstate
rmarkdown::paged_table(combined_data)
We then need to add this into our comparison function, and in doing
so we will also demonstrate the use of the index
parameter
for the particle filter.
For models with a very large state space returning only those states
required by the comparison function can be important for efficiency. The
only model outputs needed from the comparison function are Sinc
and Dinc,
i.e. the number of new cases and deaths. So, an index function to
extract just this part of the state can be defined, which operates on
dust generators info()
:
index <- function(info) {
list(run = c(cases = info$index$S_inc,
deaths = info$index$D_inc),
state = c(S = info$index$S,
I = info$index$I,
R = info$index$R,
D = info$index$D))
}
death_model <- gen_death$new(pars = list(), time = 0, n_particles = 1L)
index(death_model$info())
#> $run
#> cases deaths
#> 6 7
#>
#> $state
#> S I R D
#> 2 3 4 5
The index function sends the comparison function the indices listed
under run
, in this case the number of new cases and deaths,
and saves any compartments listed under state
if the
history is kept. If the run
and state
vectors
have named entries, these names will be passed to the particle filter,
making it simple to refer to the model states in the comparison function
and in the state history. In the comparison function, missing data must
be checked for and must contribute 0 to the likelihood if found:
# log-likelihood of Poisson count
ll_pois <- function(obs, model) {
exp_noise <- 1e6
if (is.na(obs)) {
# Creates vector of zeros in ll with same length, if no data
ll_obs <- numeric(length(model))
} else {
lambda <- model +
rexp(n = length(model), rate = exp_noise)
ll_obs <- dpois(x = obs, lambda = lambda, log = TRUE)
}
ll_obs
}
# Sum log likelihoods from each datastream
combined_compare <- function(state, observed, pars = NULL) {
ll_cases <- ll_pois(observed$cases, state["cases", , drop = TRUE])
ll_deaths <- ll_pois(observed$deaths, state["deaths", , drop = TRUE])
ll_cases + ll_deaths
}
The comparison and index()
functions can be used to set
up a particle filter as before, which now has a named history:
n_particles <- 100L
filter <- mcstate::particle_filter$new(data = combined_data,
model = gen_death,
n_particles = n_particles,
compare = combined_compare,
index = index,
seed = 1L)
filter$run(save_history = TRUE, pars = list(dt = dt))
#> [1] -1286.394
dimnames(filter$history())
#> [[1]]
#> [1] "S" "I" "R" "D"
#>
#> [[2]]
#> NULL
#>
#> [[3]]
#> NULL
This filter can be used to run the pmcmc()
method, as
above.