---
title: "Nested SIR Models"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Nested SIR Models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5)
in_pkgdown <- identical(Sys.getenv("IN_PKGDOWN"), "true")
```

This vignette extends the SIR model in `vignette("sir_models")` to a "nested" model over multiple populations. 

A simple definition of $M$ SIR models is extended from the [odin documentation](https://mrc-ide.github.io/odin/articles/discrete.html) as:
$$\begin{align*}
\frac{dS_i}{dt} &= -\beta_i \frac{S_iI_i}{N_i} \\
\frac{dI_i}{dt} &= \beta_i \frac{S_iI_i}{N_i} - \gamma_i I_i \\
\frac{dR_i}{dt} &= \gamma_i I_i \\
\end{align*}$$
for $i = 1,...,M$ populations.

We can model the simple case in which all $\beta_i, \gamma_i, i = 1,...,M$ are independent, by simply implementing multiple SIR models. However the nested (or "shared") case allows inference to be drawn on the *same* between populations, for example by setting $\beta_i = \beta, \forall i = 1,...,M$. 

Less abstractly, say a disease spreads in countries A and B with the same rate of infection but that healthcare is superior in country A and therefore expects a faster rate of recovery. Then the model parameters would be given as

$$\begin{align*}
\frac{dS_A}{dt} &= -\beta \frac{S_AI_A}{N_A}; &&
\frac{dS_B}{dt} = -\beta \frac{S_BI_B}{N_B} \\
\frac{dI_A}{dt} &= \beta \frac{S_AI_A}{N_A} - \gamma_A I_A; &&
\frac{dI_B}{dt} = \beta \frac{S_BI_B}{N_B} - \gamma_B I_B\\
\frac{dR_A}{dt} &= \gamma_A I_A; &&
\frac{dR_B}{dt} = \gamma_B I_B \\
\end{align*}$$

We will model this below by extending the example from the `vignette("sir_models")` vignette.

## Model data

As with the `vignette("sir_models")` vignette, the daily counts have been generated from a single run of the model.

```{r incidence}
incidence <- read.csv(system.file("nested_sir_incidence.csv",
                                  package = "mcstate"),
                      stringsAsFactors = TRUE)
```

Note that in contrast to regular SIR models, data for nested models should be in a long format and should include a factor column for populations. This is passed to `mcstate::particle_filter_data()`, which includes an argument, `population`, for specifying which column includes the population identifier. 

```{r filter, paged.print=TRUE, layout="l-body-outset", fig.height=5, fig.width=7}
dt <- 0.25
sir_data <- mcstate::particle_filter_data(data = incidence,
                                          time = "day",
                                          rate = 1 / dt,
                                          population = "population")
rmarkdown::paged_table(sir_data)
plot(cases ~ day, incidence[incidence$population == "B", ],
     type = "o", xlab = "Day", ylab = "New cases", pch = 19)
lines(cases ~ day, incidence[incidence$population == "A", ],
      type = "o", xlab = "Day", ylab = "New cases", pch = 19, col = 2)
legend("topright", col = 2:1, legend = c("A", "B"), lwd = 1)
```

## Comparison, model and particle filter

The comparison function, model and particle filter are all identical to the un-nested case in the `vignette("sir_models")` vignette:

```{r compare}
case_compare <- function(state, observed, pars = NULL) {
  if (is.na(observed$cases)) {
    return(NULL)
  }
  exp_noise <- 1e6
  incidence_modelled <- state[5, , drop = TRUE]
  incidence_observed <- observed$cases
  lambda <- incidence_modelled +
    rexp(n = length(incidence_modelled), rate = exp_noise)
  dpois(x = incidence_observed, lambda = lambda, log = TRUE)
}

gen_sir <- dust::dust_example("sir")

n_particles <- 100
filter <- mcstate::particle_filter$new(data = sir_data,
                                       model = gen_sir,
                                       n_particles = n_particles,
                                       compare = case_compare,
                                       seed = 42L)
```


## Nested parameters

The biggest difference between the two cases lies in the `mcstate::pmcmc_parameters_nested` object and associated `mcstate::pmcmc_varied_parameter()` function. We will separately look at the new function and then the object.

But first, in the usual case where a parameter is *fixed* (does not vary between populations), then the usual `mcstate::pmcmc_parameter()` function can be used:

```{r beta}
beta <- mcstate::pmcmc_parameter("beta", 0.2, min = 0)
```

For the gamma parameter, which varies between populations, we instead use `mcstate::pmcmc_varied_parameter()`:

```{r gamma}
gamma <- mcstate::pmcmc_varied_parameter(
  name = "gamma",
  populations = c("a", "b"),
  initial = c(0.1, 0.05),
  prior = list(
    function(p) dgamma(p, shape = 1, scale = 0.2, log = TRUE),
    function(p) dgamma(p, shape = 1, scale = 0.15, log = TRUE)
  ),
  min = 0)
```

In this example, the two populations have different initial values and priors, but the same range of values the parameter can take.
Note that this function also takes the new argument `populations` to specify the names of the populations.

Now we can create a `mcstate::pmcmc_parameters_nested` object.
As with the un-nested case we require a proposal for parameters, however for this object we have one matrix for fixed parameters and an array for varied parameters. Either a 3d array can be provided for varied proposals, in which case each layer (the third dimension) corresponds to a square matrix for a population, or a 2d matrix can be provided which is applied for all populations.
Currently fixed/varied parameters are dependent on other fixed/varied parameters only and are independent of varied/fixed parameters.

```{r nested_pars}
proposal_fixed <- matrix(0.00017)
row.names(proposal_fixed) <- colnames(proposal_fixed) <- "beta"
proposal_varied <- array(c(0.0001, 0.000095), c(1, 1, 2),
                         dimnames = list("gamma", "gamma", c("a", "b")))
row.names(proposal_varied) <- colnames(proposal_varied) <- "gamma"
mcmc_pars <- mcstate::pmcmc_parameters_nested$new(
  parameters = list(beta = beta, gamma = gamma),
  proposal_varied = proposal_varied,
  proposal_fixed = proposal_fixed,
  populations = c("a", "b"))
```

Note that this final `populations` parameter is only strictly required if no varied parameters are included in the object. Now we are ready to run the pMCMC.


## pMCMC and Visualisations


We can now continue as in the un-nested case, just be careful when analysing results to include the additional dimension of population.

```{r, include = in_pkgdown, eval = in_pkgdown}
n_steps <- 500
n_burnin <- 200
```

```{r, include = !in_pkgdown, eval = !in_pkgdown}
n_steps <- 50
n_burnin <- 10
```

```{r run}
control <- mcstate::pmcmc_control(n_steps, save_trajectories = TRUE,
                                  save_state = TRUE, save_restart = 40)
set.seed(42L)
res <- mcstate::pmcmc(mcmc_pars, filter, control = control)
```


```{r plot_infected}
t <- 0:100
par(mfrow = c(1, 2))
matplot(t, t(res$trajectories$state[2, , 1, ]), type = "l", lty = 1,
        col = "#00000011", xlab =  "Day", ylab = "Infected",
        main = "Daily Infected in Population A")
matplot(t, t(res$trajectories$state[2, , 2, ]),
        col = "#00000011", xlab =  "Day", ylab = "Infected", type = "l",
        lty = 1, main = "Daily Infected in Population B")
par(mfrow = c(1, 1))
```

And the daily incidence

```{r plot_inc}
par(mfrow = c(1, 2))
matplot(t[-1], diff(t(res$trajectories$state[4, , 1, ])), type = "l",
        lty = 1, col = "#00000005", xlab = "Day", ylab = "Daily incidence",
        main = "Daily Incidence in Population A")
points(cases ~ day, incidence[incidence$population == "A", ], col = "blue",
       pch = 19)
matplot(t[-1], diff(t(res$trajectories$state[4, , 2, ])), type = "l",
        lty = 1, col = "#00000005", xlab = "Day", ylab = "Daily incidence",
        main = "Daily Incidence in Population B")
points(cases ~ day, incidence[incidence$population == "B", ], col = "blue",
       pch = 19)
par(mfrow = c(1, 1))
```

```{r plot_sir}
plot_particle_filter <- function(history, true_history, times, population,
                                 main) {
  times <- times[seq(length(times) / 2)]
  obs_end <- max(times)

  par(mar = c(4.1, 5.1, 2.5, 0.5), las = 1)
  cols <- c(S = "#8c8cd9", I = "#cc0044", R = "#999966")
  matplot(times, t(history[1, , population,  -1]), type = "l",
          xlab = "Time", ylab = "Number of individuals",
          col = cols[["S"]], lty = 1, ylim = range(history), main = main)
  matlines(times, t(history[2, , population, -1]), col = cols[["I"]], lty = 1)
  matlines(times, t(history[3, , population, -1]), col = cols[["R"]], lty = 1)
  matpoints(times[1:obs_end], t(true_history[1:3, population, -1]), pch = 19,
            col = cols)
  legend("left", lwd = 1, col = cols, legend = names(cols), bty = "n")
}

true_history <- readRDS("nested_sir_true_history.rds")
plot_particle_filter(filter$history(), true_history, incidence$day, 1L,
                     "Population A")
plot_particle_filter(filter$history(), true_history, incidence$day, 2L,
                     "Population B")
```