---
title: "Fitting a continuous-time model"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Fitting a continuous-time model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5)
lang_output <- function(x, lang) {
  cat(c(sprintf("```%s", lang), x, "```"), sep = "\n")
}
r_output <- function(x) lang_output(x, "r")
in_pkgdown <- identical(Sys.getenv("IN_PKGDOWN"), "true")
```

While mcstate is primarily designed to work with discrete-time stochastic models written in odin.dust, it is often possible to interact continuous-time deterministic models written in odin (subject to a few restrictions detailed in the [Porting from odin vignette](https://mrc-ide.github.io/odin.dust/articles/porting.html)).

Considering a simple SIR model, as defined in the [odin documentation](https://mrc-ide.github.io/odin/articles/discrete.html):
$$\begin{align*}
\frac{dS}{dt} &= -\beta \frac{SI}{N} \\
\frac{dI}{dt} &= \beta \frac{SI}{N} - \gamma I \\
\frac{dR}{dt} &= \gamma I \\
\end{align*}$$
Where $S$ is the number of susceptibles, $I$ is the number of infected and $R$ is the number recovered; the total population size $N = S + I + R$ is constant. $\beta$ is the infection rate, $\gamma$ is the recovery rate.

We will attempt to infer parameters $\beta$ and $\gamma$ using weekly measures of population prevalence, for example taken from household prevalence surveys. These have been generated using $\beta = 0.3$ and $\gamma = 0.1$, assuming around 100 people are sampled each week. The data contain weekly entries `n_tested` denoting the number of people sampled, and `n_positive` recording the number of those sampled who tested positive.

```{r}
data <- read.csv(system.file("sir_prevalence.csv", package = "mcstate"))

par(bty = "n", mar = c(3, 3, 1, 1), mgp = c(1.5, 0.5, 0))
test_cols <- c(n_tested = "darkorange2", n_positive = "green4")
plot(data$day, data$n_tested, type = "b",
     xlab = "Day", ylab = "Number of individuals",
     ylim = c(0, max(data$n_tested)), col = test_cols[1], pch = 20)
lines(data$day, data$n_positive, type = "b", col = test_cols[2], pch = 20)
legend("left", names(test_cols), fill = test_cols, bty = "n")
```

If we write our continuous-time model in the `odin` DSL, but compile it with `odin.dust`, we can use the calibration functions in mcstate to fit our model to data. In addition to the model, we include a section of code defining a likelihood function which we will use to compare model simulations to the observed data. We use a binomial likelihood:

$$ \mathrm{n_{positive}(t)} \sim \mathrm{Bin}\bigg(\mathrm{n_{tested}}(t), \frac{I(t)}{N(t)}\bigg)$$
Where the probability of testing positive at time $t$ is given by the modelled population prevalence: $\frac{I(t)}{N(t)}$.

```{r}
sir <-  odin.dust::odin_dust({
  ## model
  initial(S) <- N_init - I_init
  initial(I) <- I_init
  initial(R) <- 0
  initial(N) <- N_init
  
  deriv(S) <- -beta * S * I / N
  deriv(I) <- beta * S * I / N - gamma * I
  deriv(R) <- gamma * I
  deriv(N) <- 0
  
  beta <- user(0.4)
  gamma <- user(0.2)
  N_init <- user(1e5)
  I_init <- user(1)
  
  ## likelihood
  n_positive <- data()
  n_tested <- data()
  compare(n_positive) ~ binomial(n_tested, I / N) 
})
```

To calibrate the model to the data, we first need to process the data into the `mcstate` format. As the `rate` parameter is not meaningful in the context of a continuous time model, we must supply a `NULL` value.

```{r}
sir_data <- mcstate::particle_filter_data(data = data, time = "day",
                                          initial_time = 0, rate = NULL)

rmarkdown::paged_table(sir_data)
```

We define a simple index function for our model, to provide meaningful parameter names: 

```{r}
index <- function(info) {
  idx <- unlist(info$index)
  list(run = idx, state = idx)
}
```

Although we are fitting a deterministic model, to use `mcstate` functionality we still need to define a particle filter object. We set `compare = NULL` as we have defined a compiled comparison function in our model code; alternatively you can supply a compare function here if you wish (for example, if you want to use distributions not yet supported in the `odin` DSL), by proceeding as described in the [SIR models with odin, dust and mcstate vignette](https://mrc-ide.github.io/mcstate/articles/sir_models.html).


```{r}

filter <- mcstate::particle_deterministic$new(data = sir_data, model = sir,
                                       index = index, compare = NULL)
```

Running this filter will give us the same output as running the deterministic model:
```{r}
filter$run(save_history = TRUE, pars = list())

times <- c(0, data$day)
mod <- sir$new(list(), time = 0, n_particles = 1)
mod_history <- vapply(times, mod$simulate, numeric(length(mod$info()$dim)))

par(mfrow = c(1, 2), mar = c(3, 3, 1, 1), mgp = c(1.5, 0.5, 0), bty = "n")
cols <- c(S = "#8c8cd9", I = "#cc0044", R = "#999966", N = "black")
matplot(times, t(mod_history), type = "l",
        xlab = "Day", ylab = "Number of individuals", col = cols, lty = 1,
        main = "Model history")
legend("left", lwd = 1, col = cols, legend = names(cols), bty = "n")
matplot(times, t(drop(filter$history())), type = "l",
        xlab = "Day", ylab = "", col = cols, lty = 1,
        main = "Filter history")


```

### Using MCMC to infer parameters

We can now perform MCMC using the Metropolis-Hastings algorithm to infer the values of $\beta$ and $\gamma$ from prevalence data. Let's first describe the parameters we wish to infer, giving a minimum, maximum. For $\gamma$ we will also apply a gamma prior with shape 1 and rate 0.2: $\Gamma(1, 0.2)$:

```{r}
beta <- mcstate::pmcmc_parameter("beta", 0.2, min = 0)
gamma <- mcstate::pmcmc_parameter("gamma", 0.1, min = 0, prior = function(p) {
  dgamma(p, shape = 1, scale = 0.2, log = TRUE)
})

proposal_matrix <- diag(1e-2, 2)
mcmc_pars <- mcstate::pmcmc_parameters$new(list(beta = beta, gamma = gamma),
                                           proposal_matrix)
```

We can now run our MCMC, opting for 4 chains run in parallel. To speed up chain convergence, we take advantage of the adaptive proposal functionality that is available for deterministic models.

```{r mcmc_combined}
control <- mcstate::pmcmc_control(n_steps = 1e4,
                                  n_workers = 4,
                                  n_threads_total = 4,
                                  n_burnin = 500,
                                  n_chains = 4,
                                  adaptive_proposal = TRUE,
                                  save_state = TRUE,
                                  save_trajectories = TRUE,
                                  progress = TRUE, )
mcmc_run <- mcstate::pmcmc(mcmc_pars, filter, control = control)
mcmc_sample <- mcstate::pmcmc_sample(mcmc_run, n_sample = 500)

```
We can visually check the chains have converged and assess the accuracy of our parameter estimates using a sample from the posterior distribution.

```{r}
par(mfrow = c(2, 2), mar = c(3, 3, 1, 1), mgp = c(1.5, 0.5, 0), bty = "n")
plot(mcmc_run$pars[, "beta"], type = "l", xlab = "Iteration",
     ylab = expression(beta))
plot(mcmc_run$pars[, "gamma"], type = "l", xlab = "Iteration",
     ylab = expression(gamma))
hist(mcmc_sample$pars[, "beta"], main = "", xlab = expression(beta),
     freq = FALSE)
abline(v = 0.3, lty = 2, col = "darkred", lwd = 3)
hist(mcmc_sample$pars[, "gamma"], main = "", xlab = expression(gamma),
     freq = FALSE)
abline(v = 0.1, lty = 2, col = "darkred", lwd = 3)
legend("topright", legend = "True value", col = "darkred", lty = 2, bty = "n")
```

We can compare our fitted trajectories to the prevalence data by sampling from the our comparison distribution to obtain an estimate of the number of positive tests under our model.

```{r}

state <- mcmc_sample$trajectories$state

prevalence <-  t(state["I", , -1] / state["N", , -1])
modelled_positives <- apply(prevalence, 2, rbinom, n = nrow(data),
                            size = data$n_tested)

par(mfrow = c(1, 2), mar = c(3, 3, 1, 1), mgp = c(1.5, 0.5, 0), bty = "n")
matplot(times, t(state["S", , ]), type = "l", lty = 1, col = cols["S"],
        ylim = c(0, max(state)), xlab = "Day", ylab = "Number of individuals")
matlines(times, t(state["R", , ]), lty = 1, col = cols["R"])
matlines(times, t(state["I", , ]), lty = 1, col = cols["I"])
legend("left", lwd = 1, col = cols[-4], legend = names(cols)[-4], bty = "n")

matplot(data$day, modelled_positives, type = "l", lty = 1, col = grey(0.7, 0.2),
        xlab = "Day", ylab = "Number of positive tests")
points(data$day, data$n_positive, pch = 20)
```