| Title: | Estimate Time Varying Reproduction Numbers from Epidemic Curves |
|---|---|
| Description: | Tools to quantify transmissibility throughout an epidemic from the analysis of time series of incidence as described in Cori et al. (2013) <doi:10.1093/aje/kwt133> and Wallinga and Teunis (2004) <doi:10.1093/aje/kwh255>. |
| Authors: | Anne Cori [aut, cre] (ORCID: <https://orcid.org/0000-0002-8443-9162>), Simon Cauchemez [ctb], Neil M. Ferguson [ctb] (ORCID: <https://orcid.org/0000-0002-1154-8093>), Christophe Fraser [ctb] (ORCID: <https://orcid.org/0000-0003-2399-9657>), Elisabeth Dahlqwist [ctb] (ORCID: <https://orcid.org/0000-0001-5797-6803>), P. Alex Demarsh [ctb], Thibaut Jombart [ctb] (ORCID: <https://orcid.org/0000-0003-2226-8692>), Zhian N. Kamvar [ctb] (ORCID: <https://orcid.org/0000-0003-1458-7108>), Justin Lessler [ctb] (ORCID: <https://orcid.org/0000-0002-9741-8109>), Shikun Li [ctb], Jonathan A. Polonsky [ctb] (ORCID: <https://orcid.org/0000-0002-8634-4255>), Jake Stockwin [ctb], Robin Thompson [ctb] (ORCID: <https://orcid.org/0000-0001-8545-5212>), Rolina van Gaalen [ctb], Rebecca Nash [ctb] (ORCID: <https://orcid.org/0000-0002-5213-4364>), Sangeeta Bhatia [ctb] (ORCID: <https://orcid.org/0000-0001-6525-101X>), Jack Wardle [ctb] (ORCID: <https://orcid.org/0000-0002-5363-1653>), Andrea Brizzi [ctb] (ORCID: <https://orcid.org/0000-0002-5639-8260>) |
| Maintainer: | Anne Cori <[email protected]> |
| License: | GPL (>=2) |
| Version: | 3.0.0 |
| Built: | 2026-05-15 12:02:10 UTC |
| Source: | https://github.com/mrc-ide/EpiEstim |
Aggregating daily incidence to longer time windows
aggregate_inc(incid, dt = 7L)aggregate_inc(incid, dt = 7L)
incid |
a vector of daily incidence values |
dt |
a positive integer, or vector thereof, indicating the length(s) of
the desired aggregation window(s). If a vector, this will be recycled. For
example, |
a vector of incidence values, aggregated to dt
## Constant aggregation e.g. weekly reporting data("SARS2003") incid <- SARS2003$incidence dt <- 7L aggregate_inc(incid, dt) ## Non-constant aggregation e.g. reporting 3x week data("SARS2003") incid <- SARS2003$incidence dt <- c(2L,2L,3L) aggregate_inc(incid, dt)## Constant aggregation e.g. weekly reporting data("SARS2003") incid <- SARS2003$incidence dt <- 7L aggregate_inc(incid, dt) ## Non-constant aggregation e.g. reporting 3x week data("SARS2003") incid <- SARS2003$incidence dt <- c(2L,2L,3L) aggregate_inc(incid, dt)
This function imputes incidence prior the first date of reported cases to
address early bias in R_t estimates.
A simple linear model is fitted on shifted, logged-incidence cases, based on
an initial observation window.
Log-incidence is computed as log(local + 0.5) to avoid -infinite logs.
Currently, no cases are assumed to be imported.
backimpute_I(incid, window_b)backimpute_I(incid, window_b)
incid |
the raw, reported incidence cases. |
window_b |
length of the observation window to fit the exponential growth model for back-imputation |
an incidence data.frame, combining back-imputed cases for a maximum
of 100 time points (with rows indexed by a negative integer rowname) and
cases (with rows indexed by a non-negative integer)
incid_all <- ceiling(exp(0.3 * 0:20)) incid_trunc <- tail(incid_all, 10) x <- backimpute_I(incid = incid_trunc, window_b = 6) idx <- as.integer(rownames(x)) > -10 x[idx, ]$local incid_allincid_all <- ceiling(exp(0.3 * 0:20)) incid_trunc <- tail(incid_all, 10) x <- backimpute_I(incid = incid_trunc, window_b = 6) idx <- as.integer(rownames(x)) > -10 x[idx, ]$local incid_all
This function splits an MCMC chain in two halves and uses the Gelman-Rubin algorithm to assess convergence of the chain by comparing its two halves.
check_cdt_samples_convergence(cdt_samples)check_cdt_samples_convergence(cdt_samples)
cdt_samples |
the |
TRUE if the Gelman Rubin test for convergence was successful, FALSE otherwise
Anne Cori
estimate_R()
## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on rotavirus data("MockRotavirus") ## estimate the serial interval from data SI_fit <- coarseDataTools::dic.fit.mcmc(dat = MockRotavirus$si_data, dist = "G", init_pars = init_mcmc_params(MockRotavirus$si_data, "G"), burnin = 1000, n.samples = 5000) ## use check_cdt_samples_convergence to check convergence converg_diag <- check_cdt_samples_convergence(SI_fit@samples) converg_diag ## End(Not run)## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on rotavirus data("MockRotavirus") ## estimate the serial interval from data SI_fit <- coarseDataTools::dic.fit.mcmc(dat = MockRotavirus$si_data, dist = "G", init_pars = init_mcmc_params(MockRotavirus$si_data, "G"), burnin = 1000, n.samples = 5000) ## use check_cdt_samples_convergence to check convergence converg_diag <- check_cdt_samples_convergence(SI_fit@samples) converg_diag ## End(Not run)
coarse2estim Transforms outputs of
dic.fit.mcmc to right format for input into
estimate_R
coarse2estim(x = NULL, dist = x@dist, samples = x@samples, thin = 10)coarse2estim(x = NULL, dist = x@dist, samples = x@samples, thin = 10)
x |
An object generated by function
|
dist |
The parametric distribution used when estimating the serial
interval. Should be one of "gamma", "weibull", "lognormal", "gamma_offset_1",
"weibull_offset_1", or "lognormal_offset_1". Note the different naming
convention compared to |
samples |
A dataframe containing the posterior samples of serial
interval parameters corresponding to the parametric choice specified in
|
thin |
A positive integer corresponding to thinning parameter; of the
posterior sample of serial interval distributions in x, only 1 in |
A list with two elements:
si_sample: a matrix where each column gives one distribution of the
serial interval to be explored, obtained from x by thinning the MCMC chain.
si_parametric_distr: the parametric distribution used when estimating
the serial interval stored in x.
The Hackout3 Parameter Estimation team.
## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on rotavirus data("MockRotavirus") ## estimate the serial interval from data SI.fit <- coarseDataTools::dic.fit.mcmc( dat = MockRotavirus$si_data, dist = "G", init.pars = init_mcmc_params(MockRotavirus$si_data, "G"), burnin = 1000, n.samples = 5000 ) ## use coarse2estim to turn this in the right format for estimate_R si_sample <- coarse2estim(SI.fit, thin = 10)$si_sample ## use estimate_R to estimate the reproduction number ## based on these estimates of the serial interval R_si_from_sample <- estimate_R( MockRotavirus$incidence, method = "si_from_sample", si_sample = si_sample, config = make_config(list(n2 = 50)) ) plot(R_si_from_sample) ## End(Not run)## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on rotavirus data("MockRotavirus") ## estimate the serial interval from data SI.fit <- coarseDataTools::dic.fit.mcmc( dat = MockRotavirus$si_data, dist = "G", init.pars = init_mcmc_params(MockRotavirus$si_data, "G"), burnin = 1000, n.samples = 5000 ) ## use coarse2estim to turn this in the right format for estimate_R si_sample <- coarse2estim(SI.fit, thin = 10)$si_sample ## use estimate_R to estimate the reproduction number ## based on these estimates of the serial interval R_si_from_sample <- estimate_R( MockRotavirus$incidence, method = "si_from_sample", si_sample = si_sample, config = make_config(list(n2 = 50)) ) plot(R_si_from_sample) ## End(Not run)
Compute the overall infectivity
compute_lambda(incid, si_distr)compute_lambda(incid, si_distr)
incid |
a list (as obtained from function |
si_distr |
a matrix where each column contains the probability mass function for the discrete serial interval for each of the pathogen/strain/variants, starting with the probability mass function for day 0 in the first row, which should be 0. Each column in the matrix should sum to 1 |
a multidimensional array containing values of the overall infectivity for each time step (1st dimension), location (2nd dimension) and pathogen/strain/variant (3rd dimension). The overall infectivity for a given location and pathogen/strain/variant represents the sum of the incidence for that location and that pathogen/strain/variant at all previous time steps, weighted by the current infectivity of those past incident cases. Pre-calculating the overall infectivity makes the algorithm much faster.
n_v <- 2 n_loc <- 3 # 3 locations T <- 100 # 100 time steps priors <- default_priors() # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) incid <- process_I_multivariant(incid) # arbitrary serial interval, same for both variants w_v <- c(0, 0.2, 0.5, 0.3) si_distr <- cbind(w_v, w_v) lambda <- compute_lambda(incid, si_distr)n_v <- 2 n_loc <- 3 # 3 locations T <- 100 # 100 time steps priors <- default_priors() # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) incid <- process_I_multivariant(incid) # arbitrary serial interval, same for both variants w_v <- c(0, 0.2, 0.5, 0.3) si_distr <- cbind(w_v, w_v) lambda <- compute_lambda(incid, si_distr)
Across a matrix of discretised probability distributions
(see estimate_advantage()) this function returns the largest index
(across all columns) such that the cumulative probability mass before index is
1 - miss_at_most.
compute_si_cutoff(si_distr, miss_at_most = 0.05)compute_si_cutoff(si_distr, miss_at_most = 0.05)
si_distr |
a matrix with two columns, each containing the probability mass function for the discrete serial interval for each of the two pathogen/strain/variants, starting with the probability mass function for day 0 in the first row, which should be 0. Each column in the matrix should sum to 1. |
miss_at_most |
numeric. Probability mass in the tail of the SI distribution |
integer
Sangeeta Bhatia
Unless specified by the user, t_min in estimate_advantage() is computed
as the sum of two indices:
compute_t_min(incid, si_distr, miss_at_most)compute_t_min(incid, si_distr, miss_at_most)
incid |
a multidimensional array containing values of the incidence for each time step (1st dimension), location (2nd dimension) and pathogen/strain/variant (3rd dimension) |
si_distr |
a matrix with two columns, each containing the probability mass function for the discrete serial interval for each of the two pathogen/strain/variants, starting with the probability mass function for day 0 in the first row, which should be 0. Each column in the matrix should sum to 1. |
miss_at_most |
numeric. Probability mass in the tail of the SI distribution |
the first day of non-zero incidence across all locations, computed using
first_nonzero_incid()
the 95th percentile of the probability mass function of the SI distribution
across all variants computed using compute_si_cutoff()
integer
Sangeeta Bhatia
This data set gives:
the UK daily incidence of deaths within 28 days of a positive COVID test (see source and references),
the discrete daily distribution of the serial interval for SARS-CoV-2, assuming a shifted Gamma distribution with mean 6.1 days, standard deviation 4.2 days and shift 1 day (see references).
A list of two elements:
incidence: a data.frame containing 672 days of observation,
si_distr: a vector containing a set of 30 probabilities.
Bi Q, Wu Y, Mei S, Ye C, Zou X, Zhang Z, et al. Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study. The Lancet Infectious Diseases. 2020 Aug 1;20(8):911–9.
https://coronavirus.data.gov.uk/details/cases
Nash RK, Cori A, Nouvellet P. Estimating the epidemic reproduction number from temporally aggregated incidence data: a statistical modelling approach and software tool. (medRxiv pre-print)
Bi Q, Wu Y, Mei S, Ye C, Zou X, Zhang Z, et al. Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study. The Lancet Infectious Diseases. 2020 Aug 1;20(8):911–9.
data("covid_deaths_2020_uk") ## estimate the reproduction number (method "non_parametric_si") res <- estimate_R( covid_deaths_2020_uk$incidence$Incidence, method = "non_parametric_si", config = make_config(list(si_distr = covid_deaths_2020_uk$si_distr)) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day.data("covid_deaths_2020_uk") ## estimate the reproduction number (method "non_parametric_si") res <- estimate_R( covid_deaths_2020_uk$incidence$Incidence, method = "non_parametric_si", config = make_config(list(si_distr = covid_deaths_2020_uk$si_distr)) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day.
Set default for MCMC control
default_mcmc_controls()default_mcmc_controls()
a list of default MCMC control parameters, containing:
n_iter: the number of iterations for the MCMC to perform
burnin: the burnin to use; MCMC iterations will only be recorded after
the burnin
thin: MCMC iterations will only be recorded after the burnin and every
thin iteration
Values can then be manually edited as in the examples below.
mcmc_control <- default_mcmc_controls() # change to run for 10 times longer mcmc_control$n_iter <- mcmc_control$n_iter * 10mcmc_control <- default_mcmc_controls() # change to run for 10 times longer mcmc_control$n_iter <- mcmc_control$n_iter * 10
Set default for Gamma priors
default_priors()default_priors()
a list of default parameters for the priors.
Values can then be manually edited as in the examples below.
Users could use functions epitrix::gamma_shapescale2mucv() and
epitrix::gamma_mucv2shapescale() to set the shape and scale corresponding
to the desired prior mean and coefficient of variation.
priors <- default_priors() # change the prior for R to have a mean of 3 priors$R$shape <- 3priors <- default_priors() # change the prior for R to have a mean of 3 priors$R$shape <- 3
Compute the discrete distribution of the serial interval, assuming that the serial interval is a shifted Gamma distributed, with shift 1.
discr_si(k, mu, sigma)discr_si(k, mu, sigma)
k |
Positive integer, or vector of positive integers for which the discrete distribution is desired. |
mu |
A positive real giving the mean of the Gamma distribution. |
sigma |
A non-negative real giving the standard deviation of the Gamma distribution. |
Assuming that the serial interval is a shifted Gamma distributed with mean
, standard deviation and shift ,
the discrete probability that the serial interval is equal to
is (see supplement of Cori et al. AJE 2013):
where is the cumulative density function of a Gamma
distribution with mean and standard deviation .
Gives the discrete probability that the serial interval is
equal to .
Anne Cori
Cori, A. et al. A new framework and software to estimate time-varying reproduction numbers during epidemics (AJE 2013).
overall_infectivity(), estimate_R()
## Computing the discrete serial interval of influenza mean_flu_si <- 2.6 sd_flu_si <- 1.5 dicrete_si_distr <- discr_si(seq(0, 20), mean_flu_si, sd_flu_si) plot(seq(0, 20), dicrete_si_distr, type = "h", lwd = 10, lend = 1, xlab = "time (days)", ylab = "frequency") title(main = "Discrete distribution of the serial interval of influenza")## Computing the discrete serial interval of influenza mean_flu_si <- 2.6 sd_flu_si <- 1.5 dicrete_si_distr <- discr_si(seq(0, 20), mean_flu_si, sd_flu_si) plot(seq(0, 20), dicrete_si_distr, type = "h", lwd = 10, lend = 1, xlab = "time (days)", ylab = "frequency") title(main = "Discrete distribution of the serial interval of influenza")
Please only use for compatibility;
Prefer the new discr_si() function instead
DiscrSI(k, mu, sigma)DiscrSI(k, mu, sigma)
k |
Positive integer, or vector of positive integers for which the discrete distribution is desired. |
mu |
A positive real giving the mean of the Gamma distribution. |
sigma |
A non-negative real giving the standard deviation of the Gamma distribution. |
Draw epsilon from marginal posterior distribution
draw_epsilon( R, incid, lambda, priors, shape_epsilon = NULL, t_min = 2L, t_max = nrow(incid), seed = NULL )draw_epsilon( R, incid, lambda, priors, shape_epsilon = NULL, t_min = 2L, t_max = nrow(incid), seed = NULL )
R |
a matrix with dimensions containing values of the instantaneous reproduction number for each time step (row) and location (column), for the reference pathogen/strain/variant |
incid |
a multidimensional array containing values of the (local) incidence for each time step (1st dimension), location (2nd dimension) and pathogen/strain/variant (3rd dimension) |
lambda |
a multidimensional array containing values of the overall
infectivity for each time step (1st dimension), location (2nd dimension)
and pathogen/strain/variant (3rd dimension). The overall infectivity for
a given location and pathogen/strain/variant represents the sum of
the incidence for that location and that pathogen/strain/variant at all
previous time steps, weighted by the current infectivity of those
past incident cases. It can be calculated from the incidence |
priors |
a list of prior parameters (shape and scale of a gamma
distribution) for epsilon and R; can be obtained from the function
|
shape_epsilon |
a value or vector of values of the shape of the posterior
distribution of epsilon for each of the non-reference variants, as returned
by function |
t_min |
an integer > 1 giving the minimum time step to consider in the estimation. Default value is 2 (as the estimation is conditional on observations at time step 1 and can therefore only start at time step 2). |
t_max |
an integer > |
seed |
a numeric value used to fix the random seed |
A value or vector of values for epsilon for each non reference pathogen/strain/variant, drawn from the marginal posterior distribution
n_loc <- 4 # 4 locations n_v <- 3 # 3 strains T <- 100 # 100 time steps priors <- default_priors() # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) incid <- process_I_multivariant(incid) # arbitrary serial interval, same for both variants w_v <- c(0, 0.2, 0.5, 0.3) si_distr <- cbind(w_v, w_v, w_v) lambda <- compute_lambda(incid, si_distr) # Constant reproduction number of 1 R <- matrix(1, nrow = T, ncol = n_loc) R[1, ] <- NA # no estimates of R on first time step draw_epsilon(R, incid$local, lambda, priors, seed = 1)n_loc <- 4 # 4 locations n_v <- 3 # 3 strains T <- 100 # 100 time steps priors <- default_priors() # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) incid <- process_I_multivariant(incid) # arbitrary serial interval, same for both variants w_v <- c(0, 0.2, 0.5, 0.3) si_distr <- cbind(w_v, w_v, w_v) lambda <- compute_lambda(incid, si_distr) # Constant reproduction number of 1 R <- matrix(1, nrow = T, ncol = n_loc) R[1, ] <- NA # no estimates of R on first time step draw_epsilon(R, incid$local, lambda, priors, seed = 1)
Draw R from marginal posterior distribution
draw_R( epsilon, incid, lambda, priors, shape_R_flat = NULL, t_min = NULL, t_max = nrow(incid), seed = NULL )draw_R( epsilon, incid, lambda, priors, shape_R_flat = NULL, t_min = NULL, t_max = nrow(incid), seed = NULL )
epsilon |
a value or vector of values for the relative transmissibility of the "new" pathogen/strain/variant(s) compared to the reference pathogen/strain/variant |
incid |
a multidimensional array containing values of the (local) incidence for each time step (1st dimension), location (2nd dimension) and pathogen/strain/variant (3rd dimension) |
lambda |
a multidimensional array containing values of the overall
infectivity for each time step (1st dimension), location (2nd dimension)
and pathogen/strain/variant (3rd dimension). The overall infectivity for
a given location and pathogen/strain/variant represents the sum of
the incidence for that location and that pathogen/strain/variant at all
previous time steps, weighted by the current infectivity of those
past incident cases. It can be calculated from the incidence |
priors |
a list of prior parameters (shape and scale of a gamma
distribution) for epsilon and R; can be obtained from the function
|
shape_R_flat |
a vector of the shape of the posterior distribution of R
for each time step t and each location l
(stored in element |
t_min |
an integer > 1 giving the minimum time step to consider in the estimation. Default value is 2 (as the estimation is conditional on observations at time step 1 and can therefore only start at time step 2). |
t_max |
an integer > |
seed |
a numeric value used to fix the random seed |
a matrix of the instantaneous reproduction number R for the reference pathogen/strain/variant for each time step (row) and each location (column) drawn from the marginal posterior distribution
n_v <- 2 n_loc <- 3 # 3 locations T <- 100 # 100 time steps priors <- default_priors() # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) incid <- process_I_multivariant(incid) # arbitrary serial interval, same for both variants w_v <- c(0, 0.2, 0.5, 0.3) si_distr <- cbind(w_v, w_v) lambda <- compute_lambda(incid, si_distr) # Epsilon = 1 i.e. no transmission advantage epsilon <- 1 draw_R(epsilon, incid$local, lambda, priors, seed = 1, t_min = 2L)n_v <- 2 n_loc <- 3 # 3 locations T <- 100 # 100 time steps priors <- default_priors() # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) incid <- process_I_multivariant(incid) # arbitrary serial interval, same for both variants w_v <- c(0, 0.2, 0.5, 0.3) si_distr <- cbind(w_v, w_v) lambda <- compute_lambda(incid, si_distr) # Epsilon = 1 i.e. no transmission advantage epsilon <- 1 draw_R(epsilon, incid$local, lambda, priors, seed = 1, t_min = 2L)
Jointly estimate the instantaneous reproduction number for a reference pathogen/strain/variant and the relative transmissibility of a "new" pathogen/strain/variant.
estimate_advantage( incid, si_distr, priors = default_priors(), mcmc_control = default_mcmc_controls(), t_min = NULL, t_max = nrow(incid), seed = NULL, incid_imported = NULL, precompute = TRUE, reorder_incid = TRUE )estimate_advantage( incid, si_distr, priors = default_priors(), mcmc_control = default_mcmc_controls(), t_min = NULL, t_max = nrow(incid), seed = NULL, incid_imported = NULL, precompute = TRUE, reorder_incid = TRUE )
incid |
a multidimensional array containing values of the incidence for each time step (1st dimension), location (2nd dimension) and pathogen/strain/variant (3rd dimension) |
si_distr |
a matrix with two columns, each containing the probability mass function for the discrete serial interval for each of the two pathogen/strain/variants, starting with the probability mass function for day 0 in the first row, which should be 0. Each column in the matrix should sum to 1. |
priors |
a list of prior parameters (shape and scale of a gamma
distribution) for epsilon and R; can be obtained from the function
|
mcmc_control |
a list of default MCMC control parameters, obtained by
default from |
t_min |
an integer > 1 giving the minimum time step to consider in the
estimation.
If |
t_max |
an integer > |
seed |
a numeric value used to fix the random seed |
incid_imported |
an optional multidimensional array containing values
of the incidence of imported cases for each time step (1st dimension),
location (2nd dimension) and pathogen/strain/variant (3rd dimension).
|
precompute |
a boolean (defaulting to |
reorder_incid |
a boolean (defaulting to |
A list with the following elements:
epsilon: a matrix containing the MCMC chain (thinned and after burnin)
for the relative transmissibility of the "new" pathogen/strain/variant(s)
compared to the reference. Each row corresponds to a "new"
pathogen/strain/variant and each column to an MCMC iteration.
R: an array containing the MCMC chain (thinned and after burnin) for the
reproduction number for the reference pathogen/strain/variant. Dimensions
of the array are time, location, and iteration of the MCMC.
convergence: a logical vector based on the Gelman-Rubin convergence
diagnostic. Each element in convergence is TRUE when the MCMC for the
corresponding epsilon has converged number of iterations specified
(otherwise FALSE).
diag: a nested list of the point estimate and upper confidence limits of
the Gelman-Rubin convergence diagnostics (as implemented in coda). The
length of diag is equal to the number of rows in epsilon. Each element od
diag is a list of length 2 where the first element is called psrf and
is a named list of the point estimate and upper confidence limits. The
second element is NULL and can be ignored.
n_v <- 2 n_loc <- 3 # 3 locations T <- 100 # 100 time steps priors <- default_priors() # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) # arbitrary serial interval, same for both variants w_v <- c(0, 0.2, 0.5, 0.3) si_distr <- cbind(w_v, w_v) # Dummy initial values for the MCMC R_init <- matrix(5, nrow = T, ncol = n_loc) R_init[1, ] <- NA # no estimates of R on first time step epsilon_init <- 5 x <- estimate_advantage(incid, si_distr, priors) # Plotting to check outputs par(mfrow = c(2, 2)) plot(x$epsilon, type = "l", xlab = "Iteration", ylab = "epsilon") # Compare with what we expect with constant incidence in all locations abline(h = 1, col = "red") plot(x$R[10, 1, ], type = "l", xlab = "Iteration", ylab = "R time 10 location 1") abline(h = 1, col = "red") plot(x$R[20, 2, ], type = "l", xlab = "Iteration", ylab = "R time 20 location 2") abline(h = 1, col = "red") plot(x$R[30, 3, ], type = "l", xlab = "Iteration", ylab = "R time 30 location 3")n_v <- 2 n_loc <- 3 # 3 locations T <- 100 # 100 time steps priors <- default_priors() # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) # arbitrary serial interval, same for both variants w_v <- c(0, 0.2, 0.5, 0.3) si_distr <- cbind(w_v, w_v) # Dummy initial values for the MCMC R_init <- matrix(5, nrow = T, ncol = n_loc) R_init[1, ] <- NA # no estimates of R on first time step epsilon_init <- 5 x <- estimate_advantage(incid, si_distr, priors) # Plotting to check outputs par(mfrow = c(2, 2)) plot(x$epsilon, type = "l", xlab = "Iteration", ylab = "epsilon") # Compare with what we expect with constant incidence in all locations abline(h = 1, col = "red") plot(x$R[10, 1, ], type = "l", xlab = "Iteration", ylab = "R time 10 location 1") abline(h = 1, col = "red") plot(x$R[20, 2, ], type = "l", xlab = "Iteration", ylab = "R time 20 location 2") abline(h = 1, col = "red") plot(x$R[30, 3, ], type = "l", xlab = "Iteration", ylab = "R time 30 location 3")
Estimate the instantaneous reproduction number of an epidemic, given the incidence time series and the serial interval distribution.
estimate_R( incid, method = c("non_parametric_si", "parametric_si", "uncertain_si", "si_from_data", "si_from_sample"), si_data = NULL, si_sample = NULL, config = make_config(incid = incid), dt = 1L, dt_out = 7L, recon_opt = "naive", iter = 10L, tol = 1e-06, grid = list(precision = 0.001, min = -1, max = 1), backimputation_window = 0, date_convention = NULL )estimate_R( incid, method = c("non_parametric_si", "parametric_si", "uncertain_si", "si_from_data", "si_from_sample"), si_data = NULL, si_sample = NULL, config = make_config(incid = incid), dt = 1L, dt_out = 7L, recon_opt = "naive", iter = 10L, tol = 1e-06, grid = list(precision = 0.001, min = -1, max = 1), backimputation_window = 0, date_convention = NULL )
incid |
One of the following
Note that the cases from the first time step are always all assumed to be imported cases. |
method |
One of "non_parametric_si", "parametric_si", "uncertain_si", "si_from_data" or "si_from_sample" (see details). |
si_data |
For method "si_from_data"; the data on dates of symptoms of pairs of infector/infected individuals to be used to estimate the serial interval distribution should be a dataframe with 5 columns:
|
si_sample |
For method "si_from_sample"; a matrix where each column gives one distribution of the serial interval to be explored (see details). |
config |
An object of class |
dt |
length of temporal aggregations of the incidence data. This should
be an integer or vector of integers. If a vector, this can either match the
length of the incidence data supplied, or it will be recycled. For
example, |
dt_out |
length of the sliding windows used for R estimates (integer,
7 time units (typically days) by default).
Only used if |
recon_opt |
one of "naive" or "match", see |
iter |
number of iterations of the EM algorithm used to reconstruct
incidence at 1-time-unit intervals(integer, 10 by default).
Only used if |
tol |
tolerance used in the convergence check (numeric, 1e-6 by default),
see |
grid |
named list containing "precision", "min", and "max" which are
used to define a grid of growth rate parameters that are used inside the EM
algorithm used to reconstruct incidence at 1-time-unit intervals.
Only used if |
backimputation_window |
Length of the window used to impute incidence
prior to the first reported cases. The default value is 0, meaning that no
back-imputation is performed. If a positive integer is provided, the
incidence is imputed for the first |
date_convention |
One of |
Analytical estimates of the reproduction number for an epidemic over predefined time windows can be obtained within a Bayesian framework, for a given discrete distribution of the serial interval (see references).
Several methods are available to specify the serial interval distribution.
In short there are five methods to specify the serial interval distribution
(see help for function make_config() for more detail on each method).
In the first two methods, a unique serial interval distribution is
considered, whereas in the last three, a range of serial interval
distributions are integrated over:
In method "non_parametric_si" the user specifies the discrete distribution of the serial interval
In method "parametric_si" the user specifies the mean and sd of the serial interval
In method "uncertain_si" the mean and sd of the serial interval are each drawn from truncated normal distributions, with parameters specified by the user
In method "si_from_data", the serial interval distribution is directly estimated, using MCMC, from interval censored exposure data, with data provided by the user together with a choice of parametric distribution for the serial interval
In method "si_from_sample", the user directly provides the sample of serial interval distribution to use for estimation of R. This can be a useful alternative to the previous method, where the MCMC estimation of the serial interval distribution could be run once, and the same estimated SI distribution then used in estimate_R in different contexts, e.g. with different time windows, hence avoiding to rerun the MCMC every time estimate_R is called.
R is estimated within a Bayesian framework, using a Gamma distributed prior,
with mean and standard deviation which can be set using the mean_prior
and std_prior arguments within the make_config() function, which can then
be used to specify config in the estimate_R() function.
Default values are a mean prior of 5 and standard deviation of 5.
This was set to a high prior value with large uncertainty so that if one
estimates R to be below 1, the result is strongly data-driven.
R is estimated on time windows specified through the config argument.
These can be overlapping or not (see make_config() function and vignette
for examples).
an object of class estimate_R, with components:
R: a dataframe containing:
the times of start and end of each time window considered ;
the posterior mean, std, and 0.025, 0.05, 0.25, 0.5, 0.75, 0.95, 0.975
quantiles of the reproduction number for each time window.
method: the method used to estimate R, one of "non_parametric_si",
"parametric_si", "uncertain_si", "si_from_data" or "si_from_sample"
si_distr: a vector or dataframe (depending on the method) containing
the discrete serial interval distribution(s) used for estimation
SI.Moments: a vector or dataframe (depending on the method)
containing the mean and std of the discrete serial interval distribution(s)
used for estimation
I: the time series of total incidence
I_local: the time series of incidence of local cases (so that
I_local + I_imported = I)
I_imported: the time series of incidence of imported cases (so that
I_local + I_imported = I)
I_imputed: the time series of incidence of imputed cases
dates: a vector of dates corresponding to the incidence time series
MCMC_converged (only for method si_from_data): a boolean showing
whether the Gelman-Rubin MCMC convergence diagnostic was successful
(TRUE) or not (FALSE)
Anne Cori
Cori, A. et al. A new framework and software to estimate time-varying reproduction numbers during epidemics (AJE 2013).
Wallinga, J. and P. Teunis. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures (AJE 2004). Reich, N.G. et al. Estimating incubation period distributions with coarse data (Statis. Med. 2009)
make_config() for general settings of the estimation
discr_si() to build serial interval distributions
sample_posterior_R() to draw samples of R values from the posterior distribution
from the output of estimate_R()
## load data on pandemic flu in a school in 2009 data("Flu2009") ## estimate the reproduction number (method "non_parametric_si") ## when not specifying t_start and t_end in config, they are set to estimate ## the reproduction number on sliding weekly windows res <- estimate_R(incid = Flu2009$incidence, method = "non_parametric_si", config = make_config(list(si_distr = Flu2009$si_distr))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number over the 7-day window ## finishing on that day. ## to specify t_start and t_end in config, e.g. to have biweekly sliding ## windows t_start <- seq(2, nrow(Flu2009$incidence) - 13) t_end <- t_start + 13 res <- estimate_R(incid = Flu2009$incidence, method = "non_parametric_si", config = make_config(list( si_distr = Flu2009$si_distr, t_start = t_start, t_end = t_end))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number over the 14-day window ## finishing on that day. ## example with an incidence object ## create fake data library(incidence) data <- c(0, 1, 1, 2, 1, 3, 4, 5, 5, 5, 5, 4, 4, 26, 6, 7, 9) location <- sample(c("local","imported"), length(data), replace = TRUE) location[1] <- "imported" # forcing the first case to be imported ## get incidence per group (location) incid <- incidence(data, groups = location) ## Estimate R with assumptions on serial interval res <- estimate_R(incid, method = "parametric_si", config = make_config(list( mean_si = 2.6, std_si = 1.5))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number over the 7-day window ## finishing on that day. ## estimate the reproduction number (method "parametric_si") res <- estimate_R(Flu2009$incidence, method = "parametric_si", config = make_config(list(mean_si = 2.6, std_si = 1.5))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number over the 7-day window ## finishing on that day. ## estimate the reproduction number (method "uncertain_si") res <- estimate_R(Flu2009$incidence, method = "uncertain_si", config = make_config(list( mean_si = 2.6, std_mean_si = 1, min_mean_si = 1, max_mean_si = 4.2, std_si = 1.5, std_std_si = 0.5, min_std_si = 0.5, max_std_si = 2.5, n1 = 100, n2 = 100))) plot(res) ## the bottom left plot produced shows, at each each day, ## the estimate of the reproduction number over the 7-day window ## finishing on that day. ## Example with back-imputation: ## here we use the first 6 days of incidence to impute cases that preceded ## the first reported cases: res_bi <- estimate_R( incid = Flu2009$incidence, method = "parametric_si", backimputation_window = 6, config = make_config(list( mean_si = 2.6, std_si = 1, t_start = t_start, t_end = t_end )) ) plot(res_bi, "R") ## We can see that early estimates of R are lower when back-imputation is ## used, even though the difference is marginal in this case. ## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on rotavirus data("MockRotavirus") mcmc_control <- make_mcmc_control( burnin = 1000, # first 1000 iterations discarded as burn-in thin = 10, # every 10th iteration will be kept, the rest discarded seed = 1 # set the seed to make the process reproducible ) R_si_from_data <- estimate_R( incid = MockRotavirus$incidence, method = "si_from_data", si_data = MockRotavirus$si_data, # symptom onset data config = make_config( si_parametric_distr = "G", # gamma dist. for SI mcmc_control = mcmc_control, n1 = 500, # number of posterior samples of SI dist. n2 = 50, # number of posterior samples of Rt dist. seed = 2 ) # set seed for reproducibility ) ## compare with version with no uncertainty R_Parametric <- estimate_R(MockRotavirus$incidence, method = "parametric_si", config = make_config(list( mean_si = mean(R_si_from_data$SI.Moments$Mean), std_si = mean(R_si_from_data$SI.Moments$Std)))) ## generate plots library(patchwork) # To arrange side-by-side p_uncertainty <- plot(R_si_from_data, "R", options_R=list(ylim=c(0, 1.5))) p_no_uncertainty <- plot(R_Parametric, "R", options_R=list(ylim=c(0, 1.5))) p_uncertainty + p_no_uncertainty ## the left hand side graph is with uncertainty in the SI distribution, the ## right hand side without. ## The credible intervals are wider when accounting for uncertainty in the SI ## distribution. ## estimate the reproduction number (method "si_from_sample") MCMC_seed <- 1 overall_seed <- 2 SI.fit <- coarseDataTools::dic.fit.mcmc(dat = MockRotavirus$si_data, dist = "G", init.pars = init_mcmc_params(MockRotavirus$si_data, "G"), burnin = 1000, n.samples = 5000, seed = MCMC_seed) si_sample <- coarse2estim(SI.fit, thin = 10)$si_sample R_si_from_sample <- estimate_R(MockRotavirus$incidence, method = "si_from_sample", si_sample = si_sample, config = make_config(list(n2 = 50, seed = overall_seed))) plot(R_si_from_sample) ## check that R_si_from_sample is the same as R_si_from_data ## since they were generated using the same MCMC algorithm to generate the SI ## sample (either internally to EpiEstim or externally) all(R_si_from_sample$R$`Mean(R)` == R_si_from_data$R$`Mean(R)`) ## End(Not run)## load data on pandemic flu in a school in 2009 data("Flu2009") ## estimate the reproduction number (method "non_parametric_si") ## when not specifying t_start and t_end in config, they are set to estimate ## the reproduction number on sliding weekly windows res <- estimate_R(incid = Flu2009$incidence, method = "non_parametric_si", config = make_config(list(si_distr = Flu2009$si_distr))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number over the 7-day window ## finishing on that day. ## to specify t_start and t_end in config, e.g. to have biweekly sliding ## windows t_start <- seq(2, nrow(Flu2009$incidence) - 13) t_end <- t_start + 13 res <- estimate_R(incid = Flu2009$incidence, method = "non_parametric_si", config = make_config(list( si_distr = Flu2009$si_distr, t_start = t_start, t_end = t_end))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number over the 14-day window ## finishing on that day. ## example with an incidence object ## create fake data library(incidence) data <- c(0, 1, 1, 2, 1, 3, 4, 5, 5, 5, 5, 4, 4, 26, 6, 7, 9) location <- sample(c("local","imported"), length(data), replace = TRUE) location[1] <- "imported" # forcing the first case to be imported ## get incidence per group (location) incid <- incidence(data, groups = location) ## Estimate R with assumptions on serial interval res <- estimate_R(incid, method = "parametric_si", config = make_config(list( mean_si = 2.6, std_si = 1.5))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number over the 7-day window ## finishing on that day. ## estimate the reproduction number (method "parametric_si") res <- estimate_R(Flu2009$incidence, method = "parametric_si", config = make_config(list(mean_si = 2.6, std_si = 1.5))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number over the 7-day window ## finishing on that day. ## estimate the reproduction number (method "uncertain_si") res <- estimate_R(Flu2009$incidence, method = "uncertain_si", config = make_config(list( mean_si = 2.6, std_mean_si = 1, min_mean_si = 1, max_mean_si = 4.2, std_si = 1.5, std_std_si = 0.5, min_std_si = 0.5, max_std_si = 2.5, n1 = 100, n2 = 100))) plot(res) ## the bottom left plot produced shows, at each each day, ## the estimate of the reproduction number over the 7-day window ## finishing on that day. ## Example with back-imputation: ## here we use the first 6 days of incidence to impute cases that preceded ## the first reported cases: res_bi <- estimate_R( incid = Flu2009$incidence, method = "parametric_si", backimputation_window = 6, config = make_config(list( mean_si = 2.6, std_si = 1, t_start = t_start, t_end = t_end )) ) plot(res_bi, "R") ## We can see that early estimates of R are lower when back-imputation is ## used, even though the difference is marginal in this case. ## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on rotavirus data("MockRotavirus") mcmc_control <- make_mcmc_control( burnin = 1000, # first 1000 iterations discarded as burn-in thin = 10, # every 10th iteration will be kept, the rest discarded seed = 1 # set the seed to make the process reproducible ) R_si_from_data <- estimate_R( incid = MockRotavirus$incidence, method = "si_from_data", si_data = MockRotavirus$si_data, # symptom onset data config = make_config( si_parametric_distr = "G", # gamma dist. for SI mcmc_control = mcmc_control, n1 = 500, # number of posterior samples of SI dist. n2 = 50, # number of posterior samples of Rt dist. seed = 2 ) # set seed for reproducibility ) ## compare with version with no uncertainty R_Parametric <- estimate_R(MockRotavirus$incidence, method = "parametric_si", config = make_config(list( mean_si = mean(R_si_from_data$SI.Moments$Mean), std_si = mean(R_si_from_data$SI.Moments$Std)))) ## generate plots library(patchwork) # To arrange side-by-side p_uncertainty <- plot(R_si_from_data, "R", options_R=list(ylim=c(0, 1.5))) p_no_uncertainty <- plot(R_Parametric, "R", options_R=list(ylim=c(0, 1.5))) p_uncertainty + p_no_uncertainty ## the left hand side graph is with uncertainty in the SI distribution, the ## right hand side without. ## The credible intervals are wider when accounting for uncertainty in the SI ## distribution. ## estimate the reproduction number (method "si_from_sample") MCMC_seed <- 1 overall_seed <- 2 SI.fit <- coarseDataTools::dic.fit.mcmc(dat = MockRotavirus$si_data, dist = "G", init.pars = init_mcmc_params(MockRotavirus$si_data, "G"), burnin = 1000, n.samples = 5000, seed = MCMC_seed) si_sample <- coarse2estim(SI.fit, thin = 10)$si_sample R_si_from_sample <- estimate_R(MockRotavirus$incidence, method = "si_from_sample", si_sample = si_sample, config = make_config(list(n2 = 50, seed = overall_seed))) plot(R_si_from_sample) ## check that R_si_from_sample is the same as R_si_from_data ## since they were generated using the same MCMC algorithm to generate the SI ## sample (either internally to EpiEstim or externally) all(R_si_from_sample$R$`Mean(R)` == R_si_from_data$R$`Mean(R)`) ## End(Not run)
Estimate instantaneous reproduction number from coarsely aggregated data
estimate_R_agg( incid, dt = 7L, dt_out = 7L, iter = 10L, tol = 1e-06, recon_opt = "naive", config = make_config(), method = c("non_parametric_si", "parametric_si"), grid = list(precision = 0.001, min = -1, max = 1), agg_dates = NULL, date_convention = NULL )estimate_R_agg( incid, dt = 7L, dt_out = 7L, iter = 10L, tol = 1e-06, recon_opt = "naive", config = make_config(), method = c("non_parametric_si", "parametric_si"), grid = list(precision = 0.001, min = -1, max = 1), agg_dates = NULL, date_convention = NULL )
incid |
Aggregated incidence data, supplied as one of the following
|
dt |
length of temporal aggregations of the incidence data. This should
be an integer or vector of integers. If a vector, this will be recycled. For
example, |
dt_out |
length of the sliding windows used for R estimates (numeric,
7 time units (typically days) by default). Only used if |
iter |
number of iterations of the EM algorithm (integer, 10 by default) |
tol |
tolerance used in the convergence check (numeric, 1e-6 by default) |
recon_opt |
one of "naive" or "match" (see details). |
config |
an object of class |
method |
one of "non_parametric_si" or "parametric_si" (see details). |
grid |
named list containing |
agg_dates |
dates corresponding to each aggregated incidence count,
supplied as a vector of class |
date_convention |
One of |
Estimation of the time-varying reproduction number from temporally-aggregated
incidence data. For full details about how Rt is estimated within this
bayesian framework, see details in estimate_R().
Here, an expectation maximisation (EM) algorithm is used to reconstruct daily
incidence from data that is provided on another timescale. In addition to the
usual parameters required in estimate_R(), estimate_R_agg()
requires that the user specify two additional parameters: dt and
dt_out. There are two other parameters that the user may modify, iter and
grid, however we recommend that the default values are used.
dt is the length of the temporal aggregations of the incidence data in
days. This can be a single integer if the aggregations do not change. E.g.
for weekly data, the user would supply dt = 7L. If the aggregation windows
vary in length, a vector of integers can be supplied. If the aggregations
have a repeating pattern, for instance, if incidence is always reported three
times per week on the same day of the week, you can supply a vector such as:
dt = c(2L,2L,3L). If the aggregations change over time, or do not have a
repeating pattern, the user can supply a full vector of aggregations matching
the length of the incidence data supplied.
dt_out is the length of the sliding window used to estimate Rt from the
reconstructed daily data. By default, Rt estimation uses weekly sliding time
windows, however, we recommend that the sliding window is at least equal to
the length of the longest aggregation window (dt) in the data.
recon_opt specifies how to handle the initial incidence data that cannot
be reconstructed by the EM algorithm (e.g. the incidence data for the aggregation
window that precedes the first aggregation window that R can be estimated for).
If "naive" is chosen, the naive disaggregation of the incidence data will be
kept. If "match" is chosen, the incidence in the preceding aggregation window
will be reconstructed by assuming that the growth rate matches that of the first
estimation window. This is "naive" by default.
iter is the number of iterations of the EM algorithm used to reconstruct
the daily incidence data. By default, iter = 10L, which has been demonstrated
to exceed the number of iterations necessary to reach convergence in simulation
studies and analysis of real-world data by the package authors (manuscript in
prep).
grid is a named list containing precision, min, and max which are
used to define a grid of growth rate parameters used inside the EM algorithm.
The grid is used to convert reproduction number estimates for each aggregation
of incidence data into growth rates, which are then used to reconstruct the
daily incidence data assuming exponential growth. The grid will auto-adjust
if it is not large enough, so we recommend using the default values.
There are three stages of the EM algorithm:
Initialisation. The EM algorithm is initialised with a naive disaggregation of the incidence data. For example, if there were 70 cases over the course of a week, this would be naively split into 10 cases per day.
Expectation. The reproduction number is estimated for each aggregation window, except for the first aggregation window (as there is no past incidence data). This means that the earliest the incidence reconstruction can start is at least the first day of the second aggregation window. Additionally, if the disaggregated incidence in subsequent aggregation windows is too low to estimate the reproduction number, this will mean that the reconstruction will not start until case numbers are sufficiently high.
Maximisation. The reproduction number estimates are then translated into growth rates for each aggregation window (Wallinga & Lipsitch, 2007) and used to reconstruct daily incidence data assuming exponential growth. The daily incidence is adjusted by a constant to ensure that if the daily incidence were to be re-aggregated, it would still sum to the original aggregated totals. The expectation and maximisation steps repeat iteratively until convergence.
The daily incidence that is reconstructed after the final iteration of the EM
algorithm is then used to estimate Rt using the same process as the original
estimate_R()' function, with sliding weekly time windows used as the default.
An object of class estimate_R(), with components:
R: a dataframe containing the times of start and end of each time window
considered; the posterior mean, std, and 0.025, 0.05, 0.25, 0.5, 0.75, 0.95,
0.975 quantiles of the reproduction number for each time window.
method: the method used to estimate R, one of "non_parametric_si",
"parametric_si", "uncertain_si", "si_from_data" or "si_from_sample"
si_distr: a vector or dataframe (depending on the method) containing
the discrete serial interval distribution(s) used for estimation
SI.Moments: a vector or dataframe (depending on the method)
containing the mean and std of the discrete serial interval distribution(s)
used for estimation
I: the time series of daily incidence reconstructed by the EM algorithm.
For the initial incidence that cannot be reconstructed (e.g. the first aggregation
window and aggregation windows where incidence is too low to estimate Rt - see
details) then the incidence returned will be the naive disaggregation of the
incidence data used to initialise the EM algorithm.
I_local: the time series of incidence of local cases (so that
I_local + I_imported = I)
I_imported: the time series of incidence of imported cases (so that
I_local + I_imported = I)
dates: a vector of dates corresponding to the incidence time series
Rebecca Nash and Anne Cori
Nash RK, Cori A, Nouvellet P. Estimating the epidemic reproduction number from temporally aggregated incidence data: a statistical modelling approach and software tool. medRxiv pre-print. (medRxiv pre-print)
Wallinga & Lipsitch. How generation intervals shape the relationship between growth rates and reproductive numbers (Proc Biol Sci 2007).
estimate_R() for details of the core function
## Example for constant aggregation windows e.g. weekly reporting # Load data on SARS in 2003 data("SARS2003") # this is daily data, but for this example we will aggregate it to weekly # counts using the `aggregate_inc()` function incid <- SARS2003$incidence dt <- 7L weekly_incid <- aggregate_inc(incid, dt) si_distr <- SARS2003$si_distr # estimate Rt using the default parameters (method "non_parametric_si") method <- "non_parametric_si" config <- make_config(list(si_distr = si_distr)) res_weekly <- estimate_R_agg(incid = weekly_incid, dt = 7L, # aggregation window of the data dt_out = 7L, # desired sliding window length iter = 10L, config = config, method = method, grid = list(precision = 0.001, min = -1, max = 1)) # Plot the result plot(res_weekly) ## Example using repeating vector of aggregation windows e.g. for consistent ## reporting 3 times a week # Using the SARS data again data("SARS2003") # For this example we will pretend data is being reported three times a week, # in 2-day, 2-day, 3-day aggregations incid <- SARS2003$incidence dt <- c(2L,2L,3L) agg_incid <- aggregate_inc(incid, dt) si_distr <- SARS2003$si_distr # estimate Rt using the default parameters (method "non_parametric_si") method <- "non_parametric_si" config <- make_config(list(si_distr = si_distr)) res_agg <- estimate_R_agg(incid = agg_incid, dt = c(2L,2L,3L), # aggregation windows of the data dt_out = 7L, # desired sliding window length iter = 10L, config = config, method = method, grid = list(precision = 0.001, min = -1, max = 1)) # Plot the result plot(res_agg) ## Example using full vector of aggregation windows dt <- rep(c(2L,2L,3L), length.out=length(agg_incid)) si_distr <- SARS2003$si_distr # estimate Rt using the default parameters (method "non_parametric_si") method <- "non_parametric_si" config <- make_config(list(si_distr = si_distr)) res_agg <- estimate_R_agg(incid = agg_incid, dt = dt, # aggregation windows of the data dt_out = 7L, # desired sliding window length iter = 10L, config = config, method = method, grid = list(precision = 0.001, min = -1, max = 1)) # Plot the result plot(res_agg)## Example for constant aggregation windows e.g. weekly reporting # Load data on SARS in 2003 data("SARS2003") # this is daily data, but for this example we will aggregate it to weekly # counts using the `aggregate_inc()` function incid <- SARS2003$incidence dt <- 7L weekly_incid <- aggregate_inc(incid, dt) si_distr <- SARS2003$si_distr # estimate Rt using the default parameters (method "non_parametric_si") method <- "non_parametric_si" config <- make_config(list(si_distr = si_distr)) res_weekly <- estimate_R_agg(incid = weekly_incid, dt = 7L, # aggregation window of the data dt_out = 7L, # desired sliding window length iter = 10L, config = config, method = method, grid = list(precision = 0.001, min = -1, max = 1)) # Plot the result plot(res_weekly) ## Example using repeating vector of aggregation windows e.g. for consistent ## reporting 3 times a week # Using the SARS data again data("SARS2003") # For this example we will pretend data is being reported three times a week, # in 2-day, 2-day, 3-day aggregations incid <- SARS2003$incidence dt <- c(2L,2L,3L) agg_incid <- aggregate_inc(incid, dt) si_distr <- SARS2003$si_distr # estimate Rt using the default parameters (method "non_parametric_si") method <- "non_parametric_si" config <- make_config(list(si_distr = si_distr)) res_agg <- estimate_R_agg(incid = agg_incid, dt = c(2L,2L,3L), # aggregation windows of the data dt_out = 7L, # desired sliding window length iter = 10L, config = config, method = method, grid = list(precision = 0.001, min = -1, max = 1)) # Plot the result plot(res_agg) ## Example using full vector of aggregation windows dt <- rep(c(2L,2L,3L), length.out=length(agg_incid)) si_distr <- SARS2003$si_distr # estimate Rt using the default parameters (method "non_parametric_si") method <- "non_parametric_si" config <- make_config(list(si_distr = si_distr)) res_agg <- estimate_R_agg(incid = agg_incid, dt = dt, # aggregation windows of the data dt_out = 7L, # desired sliding window length iter = 10L, config = config, method = method, grid = list(precision = 0.001, min = -1, max = 1)) # Plot the result plot(res_agg)
This wrapper has been created so that several estimate_R() objects can
be plotted at the same time.
estimate_R_plots(..., legend = FALSE)estimate_R_plots(..., legend = FALSE)
... |
Arguments of
|
legend |
A boolean (TRUE by default) governing the presence / absence of legends on the plots |
a plot (if what = "incid", "R", or "SI") or a
grid::grob() object (if what = "all").
Anne Cori, Zhian Kamvar
## load data on pandemic flu in a school in 2009 data("Flu2009") #### COMPARE THE INSTANTANEOUS AND CASE REPRODUCTION NUMBERS #### ## estimate the instantaneous reproduction number ## (method "non_parametric_si") R_instantaneous <- estimate_R(Flu2009$incidence, method = "non_parametric_si", config = list(t_start = seq(2, 26), t_end = seq(8, 32), si_distr = Flu2009$si_distr ) ) ## estimate the case reproduction number R_case <- wallinga_teunis(Flu2009$incidence, method = "non_parametric_si", config = list(t_start = seq(2, 26), t_end = seq(8, 32), si_distr = Flu2009$si_distr ) ) ## visualise R estimates on the same plot estimate_R_plots(list(R_instantaneous, R_case), what = "R", options_R = list(col = c("blue", "red")), legend = TRUE) #### COMPARE THE INSTANTANEOUS R ON SLIDING WEEKLY OR BIWEEKLY WINDOWS #### R_weekly <- estimate_R(Flu2009$incidence, method = "non_parametric_si", config = list(t_start = seq(9, 26), t_end = seq(15, 32), si_distr = Flu2009$si_distr ) ) R_biweekly <- estimate_R(Flu2009$incidence, method = "non_parametric_si", config = list(t_start = seq(2, 19), t_end = seq(15, 32), si_distr = Flu2009$si_distr ) ) ## visualise R estimates on the same plot estimate_R_plots(list(R_weekly, R_biweekly), what = "R", options_R = list(col = c("blue", "red")), legend = TRUE)## load data on pandemic flu in a school in 2009 data("Flu2009") #### COMPARE THE INSTANTANEOUS AND CASE REPRODUCTION NUMBERS #### ## estimate the instantaneous reproduction number ## (method "non_parametric_si") R_instantaneous <- estimate_R(Flu2009$incidence, method = "non_parametric_si", config = list(t_start = seq(2, 26), t_end = seq(8, 32), si_distr = Flu2009$si_distr ) ) ## estimate the case reproduction number R_case <- wallinga_teunis(Flu2009$incidence, method = "non_parametric_si", config = list(t_start = seq(2, 26), t_end = seq(8, 32), si_distr = Flu2009$si_distr ) ) ## visualise R estimates on the same plot estimate_R_plots(list(R_instantaneous, R_case), what = "R", options_R = list(col = c("blue", "red")), legend = TRUE) #### COMPARE THE INSTANTANEOUS R ON SLIDING WEEKLY OR BIWEEKLY WINDOWS #### R_weekly <- estimate_R(Flu2009$incidence, method = "non_parametric_si", config = list(t_start = seq(9, 26), t_end = seq(15, 32), si_distr = Flu2009$si_distr ) ) R_biweekly <- estimate_R(Flu2009$incidence, method = "non_parametric_si", config = list(t_start = seq(2, 19), t_end = seq(15, 32), si_distr = Flu2009$si_distr ) ) ## visualise R estimates on the same plot estimate_R_plots(list(R_weekly, R_biweekly), what = "R", options_R = list(col = c("blue", "red")), legend = TRUE)
Please only use for compatibility;
Prefer the new estimate_R() function instead.
EstimateR( I, T.Start, T.End, method = c("NonParametricSI", "ParametricSI", "UncertainSI"), n1 = NULL, n2 = NULL, Mean.SI = NULL, Std.SI = NULL, Std.Mean.SI = NULL, Min.Mean.SI = NULL, Max.Mean.SI = NULL, Std.Std.SI = NULL, Min.Std.SI = NULL, Max.Std.SI = NULL, SI.Distr = NULL, Mean.Prior = 5, Std.Prior = 5, CV.Posterior = 0.3, plot = FALSE, leg.pos = "topright" )EstimateR( I, T.Start, T.End, method = c("NonParametricSI", "ParametricSI", "UncertainSI"), n1 = NULL, n2 = NULL, Mean.SI = NULL, Std.SI = NULL, Std.Mean.SI = NULL, Min.Mean.SI = NULL, Max.Mean.SI = NULL, Std.Std.SI = NULL, Min.Std.SI = NULL, Max.Std.SI = NULL, SI.Distr = NULL, Mean.Prior = 5, Std.Prior = 5, CV.Posterior = 0.3, plot = FALSE, leg.pos = "topright" )
I |
see |
T.Start |
see |
T.End |
see |
method |
see |
n1 |
see |
n2 |
see |
Mean.SI |
see |
Std.SI |
see |
Std.Mean.SI |
see |
Min.Mean.SI |
see |
Max.Mean.SI |
see |
Std.Std.SI |
see |
Min.Std.SI |
see |
Max.Std.SI |
see |
SI.Distr |
see |
Mean.Prior |
see |
Std.Prior |
see |
CV.Posterior |
see |
plot |
Not used anymore, only there for compatibility |
leg.pos |
Not used anymore, only there for compatibility |
Get the first day of non-zero incidence across all variants and locations.
first_nonzero_incid(incid)first_nonzero_incid(incid)
incid |
a multidimensional array containing values of the incidence for each time step (1st dimension), location (2nd dimension) and pathogen/strain/variant (3rd dimension) |
For each variant, find the first day of non-zero incidence. The maximum of these is the smallest possible point at which estimation can begin.
integer
Sangeeta Bhatia
This data set gives:
the daily incidence of self-reported and laboratory-confirmed cases of influenza among children in a school in New York city during the 2009 H1N1 influenza pandemic (see source and references),
interval-censored serial interval data from the 2009 outbreak of H1N1 influenza in a New York city school (see references).
A list of two elements:
incidence: a dataframe with 14 lines containing dates in first column, and daily incidence in second column ,
si_data: a dataframe containing data on the generation time for 16
pairs of infector/infected individuals (see references and see argument
si_data of function estimate_R() for details on columns)
Lessler J. et al. (2009) Outbreak of 2009 pandemic influenza A (H1N1) at a New York City school. New Eng J Med 361: 2628-2636.
Lessler J. et al. (2009) Outbreak of 2009 pandemic influenza A (H1N1) at a New York City school. New Eng J Med 361: 2628-2636.
## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on pandemic flu in a New York school in 2009 data("flu_2009_NYC_school") ## estimate the reproduction number (method "si_from_data") res <- estimate_R( flu_2009_NYC_school$incidence, method = "si_from_data", si_data = flu_2009_NYC_school$si_data, config = make_config(list( t_start = seq(2, 8), t_end = seq(8, 14), si_parametric_distr = "G", mcmc_control = make_mcmc_control(list(burnin = 1000, thin = 10, seed = 1)), n1 = 1000, n2 = 50 )) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day. ## End(Not run)## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on pandemic flu in a New York school in 2009 data("flu_2009_NYC_school") ## estimate the reproduction number (method "si_from_data") res <- estimate_R( flu_2009_NYC_school$incidence, method = "si_from_data", si_data = flu_2009_NYC_school$si_data, config = make_config(list( t_start = seq(2, 8), t_end = seq(8, 14), si_parametric_distr = "G", mcmc_control = make_mcmc_control(list(burnin = 1000, thin = 10, seed = 1)), n1 = 1000, n2 = 50 )) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day. ## End(Not run)
This data set gives:
the daily incidence of onset of disease in Baltimore during the 1918 H1N1 influenza pandemic (see source and references),
the discrete daily distribution of the serial interval for influenza, assuming a shifted Gamma distribution with mean 2.6 days, standard deviation 1.5 days and shift 1 day (see references).
A list of two elements:
incidence: a vector containing 92 days of observation,
si_distr: a vector containing a set of 12 probabilities.
Frost W. and E. Sydenstricker (1919) Influenza in Maryland: preliminary statistics of certain localities. Public Health Rep.(34): 491-504.
Cauchemez S. et al. (2011) Role of social networks in shaping disease transmission during a community outbreak of 2009 H1N1 pandemic influenza. Proc Natl Acad Sci U S A 108(7), 2825-2830.
Ferguson N.M. et al. (2005) Strategies for containing an emerging influenza pandemic in Southeast Asia. Nature 437(7056), 209-214.
Fraser C. et al. (2011) Influenza Transmission in Households During the 1918 Pandemic. Am J Epidemiol 174(5): 505-514.
Frost W. and E. Sydenstricker (1919) Influenza in Maryland: preliminary statistics of certain localities. Public Health Rep.(34): 491-504.
Vynnycky E. et al. (2007) Estimates of the reproduction numbers of Spanish influenza using morbidity data. Int J Epidemiol 36(4): 881-889.
White L.F. and M. Pagano (2008) Transmissibility of the influenza virus in the 1918 pandemic. PLoS One 3(1): e1498.
## load data on pandemic flu in Baltimore in 1918 data("Flu1918") ## estimate the reproduction number (method "non_parametric_si") res <- estimate_R( Flu1918$incidence, method = "non_parametric_si", config = make_config(list(si_distr = Flu1918$si_distr)) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day.## load data on pandemic flu in Baltimore in 1918 data("Flu1918") ## estimate the reproduction number (method "non_parametric_si") res <- estimate_R( Flu1918$incidence, method = "non_parametric_si", config = make_config(list(si_distr = Flu1918$si_distr)) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day.
This data set gives:
the daily incidence of onset of acute respiratory illness (ARI, defined as at least two symptoms among fever, cough, sore throat, and runny nose) among children in a school in Pennsylvania during the 2009 H1N1 influenza pandemic (see source and references),
the discrete daily distribution of the serial interval for influenza, assuming a shifted Gamma distribution with mean 2.6 days, standard deviation 1.5 days and shift 1 day (see references).
interval-censored serial interval data from the 2009 outbreak of H1N1 influenza in San Antonio, Texas, USA (see references).
A list of three elements:
incidence: a dataframe with 32 lines containing dates in first column, and daily incidence in second column (Cauchemez et al., 2011),
si_distr: a vector containing a set of 12 probabilities (Ferguson et al, 2005),
si_data: a dataframe with 16 lines giving serial interval patient data collected in a household study in San Antonio, Texas throughout the 2009 H1N1 outbreak (Morgan et al., 2010).
Cauchemez S. et al. (2011) Role of social networks in shaping disease transmission during a community outbreak of 2009 H1N1 pandemic influenza. Proc Natl Acad Sci U S A 108(7), 2825-2830.
Morgan O.W. et al. (2010) Household transmission of pandemic (H1N1) 2009, San Antonio, Texas, USA, April-May 2009. Emerg Infect Dis 16: 631-637.
Cauchemez S. et al. (2011) Role of social networks in shaping disease transmission during a community outbreak of 2009 H1N1 pandemic influenza. Proc Natl Acad Sci U S A 108(7), 2825-2830.
Ferguson N.M. et al. (2005) Strategies for containing an emerging influenza pandemic in Southeast Asia. Nature 437(7056), 209-214.
## load data on pandemic flu in a school in 2009 data("Flu2009") ## estimate the reproduction number (method "non_parametric_si") res <- estimate_R(Flu2009$incidence, method="non_parametric_si", config = make_config(list(si_distr = Flu2009$si_distr))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day. ## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## estimate the reproduction number (method "si_from_data") res <- estimate_R( Flu2009$incidence, method = "si_from_data", si_data = Flu2009$si_data, config = make_config(list( mcmc_control = make_mcmc_control(list(burnin = 1000, thin = 10, seed = 1)), n1 = 1000, n2 = 50, si_parametric_distr = "G" )) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day. ## End(Not run)## load data on pandemic flu in a school in 2009 data("Flu2009") ## estimate the reproduction number (method "non_parametric_si") res <- estimate_R(Flu2009$incidence, method="non_parametric_si", config = make_config(list(si_distr = Flu2009$si_distr))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day. ## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## estimate the reproduction number (method "si_from_data") res <- estimate_R( Flu2009$incidence, method = "si_from_data", si_data = Flu2009$si_data, config = make_config(list( mcmc_control = make_mcmc_control(list(burnin = 1000, thin = 10, seed = 1)), n1 = 1000, n2 = 50, si_parametric_distr = "G" )) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day. ## End(Not run)
Precompute shape of posterior distribution for epsilon
get_shape_epsilon(incid, lambda, priors, t_min = 2L, t_max = nrow(incid))get_shape_epsilon(incid, lambda, priors, t_min = 2L, t_max = nrow(incid))
incid |
a multidimensional array containing values of the (local) incidence for each time step (1st dimension), location (2nd dimension) and pathogen/strain/variant (3rd dimension) |
lambda |
a multidimensional array containing values of the overall
infectivity for each time step (1st dimension), location (2nd dimension)
and pathogen/strain/variant (3rd dimension). The overall infectivity for
a given location and pathogen/strain/variant represents the sum of
the incidence for that location and that pathogen/strain/variant at all
previous time steps, weighted by the current infectivity of those
past incident cases. It can be calculated from the incidence |
priors |
a list of prior parameters (shape and scale of a gamma
distribution) for epsilon and R; can be obtained from the function
|
t_min |
an integer > 1 giving the minimum time step to consider in the estimation. Default value is 2 (as the estimation is conditional on observations at time step 1 and can therefore only start at time step 2). |
t_max |
an integer > |
a value or vector of values of the shape of the posterior distribution of epsilon for each of the non-reference variants
n_loc <- 4 # 4 locations n_v <- 3 # 3 strains T <- 100 # 100 time steps priors <- default_priors() # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) incid <- process_I_multivariant(incid) # arbitrary serial interval, same for both variants w_v <- c(0, 0.2, 0.5, 0.3) si_distr <- cbind(w_v, w_v, w_v) lambda <- compute_lambda(incid, si_distr) get_shape_epsilon(incid$local, lambda, priors)n_loc <- 4 # 4 locations n_v <- 3 # 3 strains T <- 100 # 100 time steps priors <- default_priors() # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) incid <- process_I_multivariant(incid) # arbitrary serial interval, same for both variants w_v <- c(0, 0.2, 0.5, 0.3) si_distr <- cbind(w_v, w_v, w_v) lambda <- compute_lambda(incid, si_distr) get_shape_epsilon(incid$local, lambda, priors)
Precompute shape of posterior distribution for R
get_shape_R_flat(incid, priors, t_min = 2L, t_max = nrow(incid))get_shape_R_flat(incid, priors, t_min = 2L, t_max = nrow(incid))
incid |
a multidimensional array containing values of the (local) incidence for each time step (1st dimension), location (2nd dimension) and pathogen/strain/variant (3rd dimension) |
priors |
a list of prior parameters (shape and scale of a gamma
distribution) for epsilon and R; can be obtained from the function
|
t_min |
an integer > 1 giving the minimum time step to consider in the estimation. Default value is 2 (as the estimation is conditional on observations at time step 1 and can therefore only start at time step 2). |
t_max |
an integer > |
a vector of the shape of the posterior distribution of R for
each time step t and each location l
(stored in element (l-1)*(t_max - t_min + 1) + t of the vector)
n_v <- 2 n_loc <- 3 # 3 locations T <- 100 # 100 time steps priors <- default_priors() # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) get_shape_R_flat(incid, priors)n_v <- 2 n_loc <- 3 # 3 locations T <- 100 # 100 time steps priors <- default_priors() # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) get_shape_R_flat(incid, priors)
Finds values of the serial interval distribution
parameters, used to initialise the MCMC estimation of the serial interval
distribution (e.g. when using option si_from_data in
estimate_R()). Initial values are computed based on the observed mean and
standard deviation of the sample from which the parameters are to be
estimated.
init_mcmc_params(si_data, dist)init_mcmc_params(si_data, dist)
si_data |
data on dates of symptoms of pairs of infector/infected individuals to be used to estimate the serial interval distribution. This should be a dataframe with 5 columns:
|
dist |
The parametric distribution used when estimating the serial
interval. Should be one of "gamma", "weibull", "lognormal", "gamma_offset_1",
"weibull_offset_1", or "lognormal_offset_1". Note the different naming
convention compared to |
A vector containing the initial values for the two parameters of the distribution of the serial interval. These are the shape and scale for all but the lognormal distribution, for which it is the meanlog and sdlog.
Anne Cori
## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on rotavirus data("MockRotavirus") ## get clever initial values for shape and scale of a Gamma distribution ## fitted to the the data MockRotavirus$si_data clever_init_param <- init_mcmc_params(MockRotavirus$si_data, "gamma") ## estimate the serial interval from data using a clever starting point for ## the MCMC chain SI_fit_clever <- coarseDataTools::dic.fit.mcmc(dat = MockRotavirus$si_data, dist = "G", init.pars = clever_init_param, burnin = 1000, n.samples = 5000) ## estimate the serial interval from data using a random starting point for ## the MCMC chain SI_fit_naive <- coarseDataTools::dic.fit.mcmc(dat = MockRotavirus$si_data, dist = "G", burnin = 1000, n.samples = 5000) ## use check_cdt_samples_convergence to check convergence in both situations converg_diag_clever <- check_cdt_samples_convergence(SI_fit_clever@samples) converg_diag_naive <- check_cdt_samples_convergence(SI_fit_naive@samples) converg_diag_clever converg_diag_naive ## End(Not run)## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on rotavirus data("MockRotavirus") ## get clever initial values for shape and scale of a Gamma distribution ## fitted to the the data MockRotavirus$si_data clever_init_param <- init_mcmc_params(MockRotavirus$si_data, "gamma") ## estimate the serial interval from data using a clever starting point for ## the MCMC chain SI_fit_clever <- coarseDataTools::dic.fit.mcmc(dat = MockRotavirus$si_data, dist = "G", init.pars = clever_init_param, burnin = 1000, n.samples = 5000) ## estimate the serial interval from data using a random starting point for ## the MCMC chain SI_fit_naive <- coarseDataTools::dic.fit.mcmc(dat = MockRotavirus$si_data, dist = "G", burnin = 1000, n.samples = 5000) ## use check_cdt_samples_convergence to check convergence in both situations converg_diag_clever <- check_cdt_samples_convergence(SI_fit_clever@samples) converg_diag_naive <- check_cdt_samples_convergence(SI_fit_naive@samples) converg_diag_clever converg_diag_naive ## End(Not run)
estimate_R()
Define parameters settings for estimate_R(). It takes a list of named items
as input, sets defaults where arguments are missing, and returns a list of
settings.
make_config(..., incid = NULL)make_config(..., incid = NULL)
... |
Acceptable arguments for
|
incid |
One of the following
Note that the cases from the first time step are always all assumed to be imported cases. |
Analytical estimates of the reproduction number for an epidemic over
predefined time windows can be obtained using function estimate_R(), for a
given discrete distribution of the serial interval. make_config() generates
configuration specifying the way the estimation will be performed.
The more incident cases observed over a time window, the smallest the
posterior coefficient of variation (CV, ratio of standard deviation over
mean) of the reproduction number. An aimed CV can be specified in the
argument cv_posterior (default is 0.3), and a warning will be produced if
the incidence within one of the time windows considered is too low to get
this CV.
The methods vary in the way the serial interval distribution is specified.
In short there are five methods to specify the serial interval distribution
(see below for details on each method). This is specified in the argument
method of the estimate_R() function. In the first two methods, a unique
serial interval distribution is considered, whereas in the last three, a
range of serial interval distributions are integrated over:
"non_parametric_si": the user specifies the discrete distribution of the serial interval
"parametric_si": the user specifies the mean and sd of the serial interval
"uncertain_si": the mean and sd of the serial interval are each drawn from truncated normal distributions, with parameters specified by the user
"si_from_data": the serial interval distribution is directly estimated, using MCMC, from interval censored exposure data, with data provided by the user together with a choice of parametric distribution for the serial interval
"si_from_sample": the user directly provides the sample of serial
interval distribution to use for estimation of R. This can be a useful
alternative to the previous method, where the MCMC estimation of the serial
interval distribution could be run once, and the same estimated SI
distribution then used in estimate_R() in different contexts, e.g. with
different time windows, hence avoiding having to rerun the MCMC every time
estimate_R() is called.
method = "non_parametric_si"The discrete distribution of the serial interval is directly specified in the
argument si_distr.
method = "parametric_si"The mean and standard deviation of the continuous distribution of the serial
interval are given in the arguments mean_si and std_si. The discrete
distribution of the serial interval is derived automatically using
discr_si().
method = "uncertain_si"Method "uncertain_si" allows accounting for uncertainty on the serial
interval distribution as described in Cori et al. AJE 2013. We allow the mean
and standard deviation of the serial interval to vary
according to truncated normal distributions. We sample n1 pairs of mean and
standard deviations,
, by
independently
sampling the mean from its truncated normal distribution
(with mean mean_si, standard deviation std_mean_si, minimum min_mean_si
and maximum max_mean_si), and the standard deviation
from its truncated normal distribution (with mean
std_si, standard deviation std_std_si, minimum min_std_si and maximum
max_std_si). Warnings are produced when the truncated
normal distributions are not symmetric around the mean. For each pair
, we then draw a sample of size n2 in the
posterior distribution of the reproduction number over each time window,
conditionally on this parametric serial interval distribution (using the
discr_si function to generate the corresponding serial interval probability
mass function).
After pooling across the n1 serial interval distributions, a sample
of size of the joint posterior distribution of the
reproduction number over each time window is obtained. The posterior mean,
standard deviation, and 0.025, 0.05, 0.25, 0.5, 0.75, 0.95, 0.975 quantiles
of the reproduction number for each time window are obtained from this
sample.
method = "si_from_data"Method "si_from_data" allows accounting for uncertainty on the serial
interval distribution. Unlike method "uncertain_si", where we arbitrarily
vary the mean and std of the SI in truncated normal distributions, here, the
scope of serial interval distributions considered is directly informed by
data on the (potentially censored) dates of symptoms of pairs of
infector/infected individuals. This data, specified in argument si_data,
should be a dataframe with 5 columns:
EL: the lower bound of the symptom onset date of the infector (given as
an integer)
ER: the upper bound of the symptom onset date of the infector (given as
an integer). Should be such that ER >= EL. If the dates are known exactly
use ER = EL
SL: the lower bound of the symptom onset date of the infected individual
(given as an integer)
SR: the upper bound of the symptom onset date of the infected individual
(given as an integer). Should be such that SR >= SL. If the dates are
known exactly use SR = SL
type (optional): can have entries 0, 1, or 2, corresponding to doubly
interval-censored, single interval-censored or exact observations,
respectively, see Reich et al. Statist. Med. 2009. If not specified, this
will be automatically computed from the dates
Assuming a given parametric distribution for the serial interval distribution
(specified in si_parametric_distr), the posterior distribution of the
serial interval is estimated directly from these data using MCMC methods
implemented in the package coarsedatatools. The argument mcmc_control is a
list of characteristics which control the MCMC. The MCMC is run for a total
number of iterations of mcmc_control$burnin + n1*mcmc_control$thin; but the
output is only recorded after the burnin, and only 1 in every
mcmc_control$thin iterations, so that the posterior sample size is n1.
For each element in the posterior sample of serial interval distribution, we
then draw a sample of size n2 in the posterior distribution of the
reproduction number over each time window, conditionally on this serial
interval distribution. After pooling, a sample of size
of the joint posterior distribution of the reproduction number over each time
window is obtained. The posterior mean, standard deviation, and 0.025, 0.05,
0.25, 0.5, 0.75, 0.95, 0.975 quantiles of the reproduction number for each
time window are obtained from this sample.
method = "si_from_sample"Method "si_from_sample" also allows accounting for uncertainty on the serial
interval distribution. Unlike methods "uncertain_si" and "si_from_data", the
user directly provides (in argument si_sample) a sample of serial interval
distribution to be explored.
An object of class estimate_R_config with components
t_start, t_end, n1, n2, mean_si, std_si,
std_mean_si, min_mean_si, max_mean_si, std_std_si, min_std_si, max_std_si,
si_distr, si_parametric_distr, mcmc_control, seed, mean_prior, std_prior,
cv_posterior, which can be used as an argument of function estimate_R().
## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on rotavirus data("MockRotavirus") ## estimate the reproduction number (method "si_from_data") ## we are not specifying the time windows, so by defaults this will estimate ## R on sliding weekly windows incid <- MockRotavirus$incidence method <- "si_from_data" config <- make_config(incid = incid, list(si_parametric_distr = "G", mcmc_control = make_mcmc_control(burnin = 1000, thin = 10, seed = 1), n1 = 500, n2 = 50, seed = 2)) R_si_from_data <- estimate_R(incid, method = method, si_data = MockRotavirus$si_data, config = config) plot(R_si_from_data) ## you can also create the config straight within the estimate_R call, ## in that case incid and method are automatically used from the estimate_R ## arguments: R_si_from_data <- estimate_R(incid, method = method, si_data = MockRotavirus$si_data, config = make_config( list(si_parametric_distr = "G", mcmc_control = make_mcmc_control(burnin = 1000, thin = 10, seed = 1), n1 = 500, n2 = 50, seed = 2))) plot(R_si_from_data) ## End(Not run)## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on rotavirus data("MockRotavirus") ## estimate the reproduction number (method "si_from_data") ## we are not specifying the time windows, so by defaults this will estimate ## R on sliding weekly windows incid <- MockRotavirus$incidence method <- "si_from_data" config <- make_config(incid = incid, list(si_parametric_distr = "G", mcmc_control = make_mcmc_control(burnin = 1000, thin = 10, seed = 1), n1 = 500, n2 = 50, seed = 2)) R_si_from_data <- estimate_R(incid, method = method, si_data = MockRotavirus$si_data, config = config) plot(R_si_from_data) ## you can also create the config straight within the estimate_R call, ## in that case incid and method are automatically used from the estimate_R ## arguments: R_si_from_data <- estimate_R(incid, method = method, si_data = MockRotavirus$si_data, config = make_config( list(si_parametric_distr = "G", mcmc_control = make_mcmc_control(burnin = 1000, thin = 10, seed = 1), n1 = 500, n2 = 50, seed = 2))) plot(R_si_from_data) ## End(Not run)
Creates a list of MCMC control parameters to be used in
config$mcmc_control, where config is an argument of the
estimate_R() function. This is used to configure the MCMC chain used to
estimate the serial interval within estimate_R() (with method
"si_from_data").
make_mcmc_control( burnin = 3000, thin = 10, seed = as.integer(Sys.time()), init_pars = NULL )make_mcmc_control( burnin = 3000, thin = 10, seed = as.integer(Sys.time()), init_pars = NULL )
burnin |
A positive integer giving the burnin used in the MCMC when estimating the serial interval distribution. |
thin |
A positive integer corresponding to thinning parameter; the MCMC
will be run for |
seed |
An integer used as the seed for the random number generator at the start of the MCMC estimation; useful to get reproducible results. |
init_pars |
vector of size 2 corresponding to the initial values of parameters to use for the SI distribution. This is the shape and scale for all but the lognormal distribution, for which it is the meanlog and sdlog. |
The argument si_data, should be a dataframe with 5 columns:
EL: the lower bound of the symptom onset date of the infector (given as
an integer)
ER: the upper bound of the symptom onset date of the infector (given as
an integer). Should be such that ER >= EL
SL: the lower bound of the symptom onset date of the infected
individual (given as an integer)
SR: the upper bound of the symptom onset date of the infected
individual (given as an integer). Should be such that SR >= SL
type (optional): can have entries 0, 1, or 2, corresponding to doubly
interval-censored, single interval-censored or exact observations,
respectively, see Reich et al. Statist. Med. 2009. If not specified, this
will be automatically computed from the dates
Assuming a given parametric distribution for the serial interval distribution
(specified in si_parametric_distr),
the posterior distribution of the serial interval is estimated directly from
these data using MCMC methods implemented in the package
An object of class estimate_R_mcmc_control with components
burnin, thin, seed, init_pars. This can be
used as an argument of function make_config().
## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on rotavirus data("MockRotavirus") ## estimate the reproduction number (method "si_from_data") mcmc_seed <- 1 burnin <- 1000 thin <- 10 mcmc_control <- make_mcmc_control(burnin = burnin, thin = thin, seed = mcmc_seed) incid <- MockRotavirus$incidence method <- "si_from_data" overall_seed <- 2 config <- make_config(incid = incid, si_parametric_distr = "G", mcmc_control = mcmc_control, n1 = 500 n2 = 50, seed = overall_seed) R_si_from_data <- estimate_R(incid, method = method, si_data = MockRotavirus$si_data, config = config) ## End(Not run)## Not run: ## Note the following examples use an MCMC routine ## to estimate the serial interval distribution from data, ## so they may take a few minutes to run ## load data on rotavirus data("MockRotavirus") ## estimate the reproduction number (method "si_from_data") mcmc_seed <- 1 burnin <- 1000 thin <- 10 mcmc_control <- make_mcmc_control(burnin = burnin, thin = thin, seed = mcmc_seed) incid <- MockRotavirus$incidence method <- "si_from_data" overall_seed <- 2 config <- make_config(incid = incid, si_parametric_distr = "G", mcmc_control = mcmc_control, n1 = 500 n2 = 50, seed = overall_seed) R_si_from_data <- estimate_R(incid, method = method, si_data = MockRotavirus$si_data, config = config) ## End(Not run)
This data set gives:
the daily incidence of onset of symptoms in Hallegoch (Germany) during the 1861 measles epidemic (see source and references),
the discrete daily distribution of the serial interval for measles, assuming a shifted Gamma distribution with mean 14.9 days, standard deviation 3.9 days and shift 1 day (see references).
A list of two elements:
incidence: a vector containing 48 days of observation,
si_distr: a vector containing a set of 38 probabilities.
Groendyke C. et al. (2011) Bayesian Inference for Contact Networks Given Epidemic Data. Scandinavian Journal of Statistics 38(3): 600-616.
Groendyke C. et al. (2011) Bayesian Inference for Contact Networks Given Epidemic Data. Scandinavian Journal of Statistics 38(3): 600-616.
## load data on measles in Hallegoch in 1861 data("Measles1861") ## estimate the reproduction number (method "non_parametric_si") res <- estimate_R( Measles1861$incidence, method = "non_parametric_si", config = make_config(list( t_start = seq(17, 42), t_end = seq(23, 48), si_distr = Measles1861$si_distr )) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day.## load data on measles in Hallegoch in 1861 data("Measles1861") ## estimate the reproduction number (method "non_parametric_si") res <- estimate_R( Measles1861$incidence, method = "non_parametric_si", config = make_config(list( t_start = seq(17, 42), t_end = seq(23, 48), si_distr = Measles1861$si_distr )) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day.
This data set gives:
the daily incidence of onset of symptoms of laboratory confirmed human infections with MERS-CoV in Saudi Arabia between the beginning of July 2014 and the end of December 2015, and
estimates of the mean and standard deviation of the serial interval for MERS.
A list of two elements:
incidence: a dataframe containing 495 days of observations with dates in the first column, and number of local (2nd column) and imported (3rd column) cases of MERS,
si: a list of estimates of the mean (mean_si) and standard deviation (std_si) of the serial interval for MERS.
Incidence data was extracted from the EMPRES I system from FAO (Global Animal
Disease Information System - Food and Agriculture Organization of the United
Nations, 2017). Note incidence on the first day was originally made of one
local case and zero imported cases; this has been modified to zero local
cases and one imported case in the dataset shown here so the reproduction
number can be estimated from the start using the function estimate_R(). The
serial interval parameters were those estimated by Cauchemez et al. (2016).
Global Animal Disease Information System - Food and Agriculture Organization of the United Nations, 2017
Cauchemez S, Nouvellet P, Cori A, Jombart T, Garske T, Clapham H, Moore S, Linden Mills H, Salje H, Collins C, et al. 2016. Unraveling the drivers of MERS-CoV transmission. Proc Natl Acad Sci 113: 9081-9086.
## load data data("mers_2014_15") ## estimate the reproduction number (method "parametric_si") bimonthly_R <- estimate_R( mers_2014_15$incidence[, c("local", "imported")], method = "parametric_si", config = make_config( mean_si = mers_2014_15$si$mean_si, std_si = mers_2014_15$si$std_si, t_start = 2:(nrow(mers_2014_15$incidence) - 8 * 7), t_end = (2:(nrow(mers_2014_15$incidence) - 8 * 7)) + 8 * 7 ) ) plot( bimonthly_R, legend = FALSE, add_imported_cases = TRUE, options_I = list( col = c("local" = "black", "imported" = "red"), interval = 7, # show weekly incidence ylab = "Weekly incidence" ), options_R = list(ylab = "Bimonthly R") ) # The first plot shows the weekly incidence, # with imported cases shown in red and local cases in black## load data data("mers_2014_15") ## estimate the reproduction number (method "parametric_si") bimonthly_R <- estimate_R( mers_2014_15$incidence[, c("local", "imported")], method = "parametric_si", config = make_config( mean_si = mers_2014_15$si$mean_si, std_si = mers_2014_15$si$std_si, t_start = 2:(nrow(mers_2014_15$incidence) - 8 * 7), t_end = (2:(nrow(mers_2014_15$incidence) - 8 * 7)) + 8 * 7 ) ) plot( bimonthly_R, legend = FALSE, add_imported_cases = TRUE, options_I = list( col = c("local" = "black", "imported" = "red"), interval = 7, # show weekly incidence ylab = "Weekly incidence" ), options_R = list(ylab = "Bimonthly R") ) # The first plot shows the weekly incidence, # with imported cases shown in red and local cases in black
This data set gives:
the daily incidence of onset of symptoms in a mock outbreak of rotavirus,
mock observations of symptom onset dates for 19 pairs of infector/infected individuals.
A list of two elements:
incidence: a vector containing 53 days of observation,
si_distr: a dataframe containing a set of 19 observations; each observation corresponds to a pair of infector/infected individuals. EL and ER columns contain the lower an upper bounds of the dates of symptoms onset in the infectors. SL and SR columns contain the lower an upper bounds of the dates of symptoms onset in the infected individuals. The type column has entries 0, 1, or 2, corresponding to doubly interval-censored, single interval-censored or exact observations, respectively, see Reich et al. Statist. Med. 2009
## Not run: ## Note the following example uses an MCMC routine ## to estimate the serial interval distribution from data, ## so may take a few minutes to run ## load data data("MockRotavirus") ## estimate the reproduction number (method "si_from_data") res <- estimate_R( MockRotavirus$incidence, method = "si_from_data", si_data = MockRotavirus$si_data, config = make_config(list( si_parametric_distr = "G", mcmc_control = make_mcmc_control(list(burnin = 3000, thin = 10)), n1 = 500, n2 = 50 )) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day. ## End(Not run)## Not run: ## Note the following example uses an MCMC routine ## to estimate the serial interval distribution from data, ## so may take a few minutes to run ## load data data("MockRotavirus") ## estimate the reproduction number (method "si_from_data") res <- estimate_R( MockRotavirus$incidence, method = "si_from_data", si_data = MockRotavirus$si_data, config = make_config(list( si_parametric_distr = "G", mcmc_control = make_mcmc_control(list(burnin = 3000, thin = 10)), n1 = 500, n2 = 50 )) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day. ## End(Not run)
Computes the overall infectivity due to previously infected individuals.
overall_infectivity(incid, si_distr)overall_infectivity(incid, si_distr)
incid |
One of the following:
Note that the cases from the first time step are always all assumed to be imported cases. |
si_distr |
Vector of probabilities giving the discrete distribution of the serial interval. |
The overall infectivity at time step is
equal to the sum of the previously infected individuals (given by the
incidence vector , with I = incid$local + incid$imported if
is a matrix), weighted by their infectivity at time (given by
the discrete serial interval distribution ).
In mathematical terms:
A vector which contains the overall infectivity at
each time step
Anne Cori
Cori, A. et al. A new framework and software to estimate time-varying reproduction numbers during epidemics (AJE 2013).
## load data on pandemic flu in a school in 2009 data("Flu2009") ## compute overall infectivity lambda <- overall_infectivity(Flu2009$incidence, Flu2009$si_distr) par(mfrow=c(2,1)) plot(Flu2009$incidence, type = "s", xlab = "time (days)", ylab = "incidence") title(main = "Epidemic curve") plot(lambda, type = "s", xlab = "time (days)", ylab = "Infectivity") title(main = "Overall infectivity")## load data on pandemic flu in a school in 2009 data("Flu2009") ## compute overall infectivity lambda <- overall_infectivity(Flu2009$incidence, Flu2009$si_distr) par(mfrow=c(2,1)) plot(Flu2009$incidence, type = "s", xlab = "time (days)", ylab = "incidence") title(main = "Epidemic curve") plot(lambda, type = "s", xlab = "time (days)", ylab = "Infectivity") title(main = "Overall infectivity")
Please only use for compatibility;
Prefer the new overall_infectivity() function instead
OverallInfectivity(I, SI.Distr)OverallInfectivity(I, SI.Distr)
I |
see |
SI.Distr |
see |
estimate_R()
The plot method of estimate_R() objects can be used to visualise three
types of information. The first one shows the epidemic curve. The second one
shows the posterior mean and 95% credible interval of the reproduction
number. The estimate for a time window is plotted at the end of the time
window. The third plot shows the discrete distribution(s) of the serial
interval.
## S3 method for class 'estimate_R' plot( x, what = c("all", "incid", "R", "SI"), plot_theme = "v2", add_imported_cases = FALSE, options_I = list(col = grDevices::palette(), transp = 0.7, xlim = NULL, ylim = NULL, interval = 1L, xlab = "Time", ylab = "Incidence"), options_R = list(col = grDevices::palette(), transp = 0.2, xlim = NULL, ylim = NULL, xlab = "Time", ylab = "R"), options_SI = list(prob_min = 0.001, col = "black", transp = 0.25, xlim = NULL, ylim = NULL, xlab = "Time", ylab = "Frequency"), legend = TRUE, ... ) ## S3 method for class 'wallinga_teunis' plot( x, what = c("all", "incid", "R", "SI"), plot_theme = "v2", add_imported_cases = FALSE, options_I = list(col = grDevices::palette(), transp = 0.7, xlim = NULL, ylim = NULL, interval = 1L, xlab = "Time", ylab = "Incidence"), options_R = list(col = grDevices::palette(), transp = 0.2, xlim = NULL, ylim = NULL, xlab = "Time", ylab = "R"), options_SI = list(prob_min = 0.001, col = "black", transp = 0.25, xlim = NULL, ylim = NULL, xlab = "Time", ylab = "Frequency"), legend = TRUE, ... )## S3 method for class 'estimate_R' plot( x, what = c("all", "incid", "R", "SI"), plot_theme = "v2", add_imported_cases = FALSE, options_I = list(col = grDevices::palette(), transp = 0.7, xlim = NULL, ylim = NULL, interval = 1L, xlab = "Time", ylab = "Incidence"), options_R = list(col = grDevices::palette(), transp = 0.2, xlim = NULL, ylim = NULL, xlab = "Time", ylab = "R"), options_SI = list(prob_min = 0.001, col = "black", transp = 0.25, xlim = NULL, ylim = NULL, xlab = "Time", ylab = "Frequency"), legend = TRUE, ... ) ## S3 method for class 'wallinga_teunis' plot( x, what = c("all", "incid", "R", "SI"), plot_theme = "v2", add_imported_cases = FALSE, options_I = list(col = grDevices::palette(), transp = 0.7, xlim = NULL, ylim = NULL, interval = 1L, xlab = "Time", ylab = "Incidence"), options_R = list(col = grDevices::palette(), transp = 0.2, xlim = NULL, ylim = NULL, xlab = "Time", ylab = "R"), options_SI = list(prob_min = 0.001, col = "black", transp = 0.25, xlim = NULL, ylim = NULL, xlab = "Time", ylab = "Frequency"), legend = TRUE, ... )
x |
The output of function |
what |
A string specifying what to plot, namely the incidence time
series ( |
plot_theme |
A string specifying whether to use the original plot theme
( |
add_imported_cases |
A boolean to specify whether, on the incidence time series plot, to add the incidence of imported cases. |
options_I |
For
|
options_R |
For
|
options_SI |
For
|
legend |
A boolean ( |
... |
further arguments passed to other methods (currently unused). |
A plot (if what = "incid", "R", or "SI") or a
grid::grob() object (if what = "all").
Rolina van Gaalen and Anne Cori; S3 method by Thibaut Jombart; v2 theme by Rebecca Nash
estimate_R(), wallinga_teunis() and estimate_R_plots().
## load data on pandemic flu in a school in 2009 data("Flu2009") ## estimate the instantaneous reproduction number ## (method "non_parametric_si") R_i <- estimate_R(Flu2009$incidence, method = "non_parametric_si", config = list( t_start = seq(2, 26), t_end = seq(8, 32), si_distr = Flu2009$si_distr ) ) ## visualise results plot(R_i, legend = FALSE) ## estimate the instantaneous reproduction number ## (method "non_parametric_si") R_c <- wallinga_teunis( Flu2009$incidence, method = "non_parametric_si", config = list( t_start = seq(2, 26), t_end = seq(8, 32), si_distr = Flu2009$si_distr, n_sim = 10 ) ) ## produce plot of the incidence ## (with, on top of total incidence, the incidence of imported cases), ## estimated instantaneous and case reproduction numbers ## and serial interval distribution used p_I <- plot(R_i, "incid", add_imported_cases=TRUE) # plots the incidence p_SI <- plot(R_i, "SI") # plots the serial interval distribution p_Ri <- plot(R_i, "R", options_R = list(ylim = c(0, 4))) # plots the estimated instantaneous reproduction number p_Rc <- plot(R_c, "R", options_R = list(ylim = c(0, 4))) # plots the estimated case reproduction number library(patchwork) # For figure placement (p_I + p_SI) / (p_Ri + p_Rc)## load data on pandemic flu in a school in 2009 data("Flu2009") ## estimate the instantaneous reproduction number ## (method "non_parametric_si") R_i <- estimate_R(Flu2009$incidence, method = "non_parametric_si", config = list( t_start = seq(2, 26), t_end = seq(8, 32), si_distr = Flu2009$si_distr ) ) ## visualise results plot(R_i, legend = FALSE) ## estimate the instantaneous reproduction number ## (method "non_parametric_si") R_c <- wallinga_teunis( Flu2009$incidence, method = "non_parametric_si", config = list( t_start = seq(2, 26), t_end = seq(8, 32), si_distr = Flu2009$si_distr, n_sim = 10 ) ) ## produce plot of the incidence ## (with, on top of total incidence, the incidence of imported cases), ## estimated instantaneous and case reproduction numbers ## and serial interval distribution used p_I <- plot(R_i, "incid", add_imported_cases=TRUE) # plots the incidence p_SI <- plot(R_i, "SI") # plots the serial interval distribution p_Ri <- plot(R_i, "R", options_R = list(ylim = c(0, 4))) # plots the estimated instantaneous reproduction number p_Rc <- plot(R_c, "R", options_R = list(ylim = c(0, 4))) # plots the estimated case reproduction number library(patchwork) # For figure placement (p_I + p_SI) / (p_Ri + p_Rc)
Process incidence input for multivariant analyses with estimate_advantage()
process_I_multivariant(incid, incid_imported = NULL)process_I_multivariant(incid, incid_imported = NULL)
incid |
a multidimensional array containing values of the incidence for each time step (1st dimension), location (2nd dimension) and pathogen/strain/variant (3rd dimension) |
incid_imported |
an optional multidimensional array containing values
of the incidence of imported cases
for each time step (1st dimension), location (2nd dimension) and
pathogen/strain/variant (3rd dimension). |
a list with two multidimensional elements each with three dimensions: timestep, location and pathogen/strain/variant:
local: an array of the incidence of locally infected cases
imported: an array of the incidence of imported cases
n_v <- 3 # 3 variants n_loc <- 1 # 1 location T <- 100 # 100 time steps # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) process_I_multivariant(incid)n_v <- 3 # 3 variants n_loc <- 1 # 1 location T <- 100 # 100 time steps # constant incidence 10 per day everywhere incid <- array(10, dim = c(T, n_loc, n_v)) process_I_multivariant(incid)
This function derives a sample of R values from the estimations of
instantaneous or case reproduction numbers derived using estimate_R() or
wallinga_teunis().
sample_posterior_R(R, n = 1000, window = 1L)sample_posterior_R(R, n = 1000, window = 1L)
R |
an object generated by |
n |
an integer specifying the number of samples to be taken from the gamma distribution; defaults to 1000. |
window |
an integer (or sequence of integers) specifying the window(s) from which to estimate R. Defaults to the first window. If multiple windows are specified, the resulting samples will be drawn from several distributions; defaults to 1L. |
n values of R from the posterior R distribution
Anne Cori
## load data on pandemic flu in a school in 2009 data("Flu2009") ## estimate the reproduction number (method "non_parametric_si") ## when not specifying t_start and t_end in config, they are set to estimate ## the reproduction number on sliding weekly windows res <- estimate_R(incid = Flu2009$incidence, method = "non_parametric_si", config = make_config(list(si_distr = Flu2009$si_distr))) ## Sample R from the first weekly window win <- 1L R_median <- res$R$`Median(R)`[win] R_CrI <- c(res$R$`Quantile.0.025(R)`[win], res$R$`Quantile.0.975(R)`[win]) set.seed(2019-06-06) # fixing the random seed for reproducibility R_sample <- sample_posterior_R(res, n = 1000, window = win) hist(R_sample, col = "grey", main = "R sampled from the first weekly window") abline(v = R_median, col = "red") # show the median estimated R abline(v = R_CrI, col = "red", lty = 2) # show the 95%CrI of R## load data on pandemic flu in a school in 2009 data("Flu2009") ## estimate the reproduction number (method "non_parametric_si") ## when not specifying t_start and t_end in config, they are set to estimate ## the reproduction number on sliding weekly windows res <- estimate_R(incid = Flu2009$incidence, method = "non_parametric_si", config = make_config(list(si_distr = Flu2009$si_distr))) ## Sample R from the first weekly window win <- 1L R_median <- res$R$`Median(R)`[win] R_CrI <- c(res$R$`Quantile.0.025(R)`[win], res$R$`Quantile.0.975(R)`[win]) set.seed(2019-06-06) # fixing the random seed for reproducibility R_sample <- sample_posterior_R(res, n = 1000, window = win) hist(R_sample, col = "grey", main = "R sampled from the first weekly window") abline(v = R_median, col = "red") # show the median estimated R abline(v = R_CrI, col = "red", lty = 2) # show the 95%CrI of R
This data set gives:
the daily incidence of onset of symptoms in Hong Kong during the 2003 severe acute respiratory syndrome (SARS) epidemic (see source and references),
the discrete daily distribution of the serial interval for SARS, assuming a shifted Gamma distribution with mean 8.4 days, standard deviation 3.8 days and shift 1 day (see references).
A list of two elements:
incidence: a vector containing 107 days of observation,
si_distr: a vector containing a set of 25 probabilities.
Cori A. et al. (2009) Temporal variability and social heterogeneity in disease transmission: the case of SARS in Hong Kong. PLoS Comput Biol 5(8): e1000471.
Cori A. et al. (2009) Temporal variability and social heterogeneity in disease transmission: the case of SARS in Hong Kong. PLoS Comput Biol 5(8): e1000471.
Lipsitch M. et al. (2003) Transmission dynamics and control of severe acute respiratory syndrome. Science 300(5627): 1966-1970.
## load data on SARS in Hong Kong in 2003 data("SARS2003") ## estimate the reproduction number (method "non_parametric_si") res <- estimate_R(SARS2003$incidence, method="non_parametric_si", config = make_config(list( t_start = seq(14, 101), t_end = seq(20, 107), si_distr = SARS2003$si_distr))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day.## load data on SARS in Hong Kong in 2003 data("SARS2003") ## estimate the reproduction number (method "non_parametric_si") res <- estimate_R(SARS2003$incidence, method="non_parametric_si", config = make_config(list( t_start = seq(14, 101), t_end = seq(20, 107), si_distr = SARS2003$si_distr))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day.
When using si_from_data method, the package will use
dic.fit.mcmc to fit the serial interval
distribution. This method supports only a limited set of distributions.
This function returns the valid distribution names as used by EpiEstim. The
names are internally converted to those used by coarsedatatools by
convert_distr_name_for_mcmc function.
si_from_data_valid_distrs(dist)si_from_data_valid_distrs(dist)
dist |
The parametric distribution used when estimating the serial
interval. Should be one of "gamma", "weibull", "lognormal", "gamma_offset_1",
"weibull_offset_1", or "lognormal_offset_1". Note the different naming
convention compared to |
A two element list - the first element is a flag is_dist_valid
indicating whether the passed distribution is valid. Te second element
all_valid_distrs is a character vector with the valid distribution names.
Sangeeta Bhatia
This data set gives:
the daily incidence of onset of symptoms in Kosovo during the 1972 smallpox epidemic (see source and references),
the discrete daily distribution of the serial interval for smallpox, assuming a shifted Gamma distribution with mean 22.4 days, standard deviation 6.1 days and shift 1 day (see references).
A list of two elements:
incidence: a vector containing 57 days of observation,
si_distr: a vector containing a set of 46 probabilities.
Fenner F. et al. (1988) Smallpox and its Eradication. Geneva, World Health Organization.
Fenner F. et al. (1988) Smallpox and its Eradication. Geneva, World Health Organization.
Gani R. and S. Leach (2001) Transmission potential of smallpox in contemporary populations. Nature 414(6865): 748-751.
Riley S. and N. M. Ferguson (2006) Smallpox transmission and control: spatial dynamics in Great Britain. Proc Natl Acad Sci U S A 103(33): 12637-12642.
## load data on smallpox in Kosovo in 1972 data("Smallpox1972") ## estimate the reproduction number (method "non_parametric_si") res <- estimate_R(Smallpox1972$incidence, method="non_parametric_si", config = make_config(list( t_start = seq(27, 51), t_end = seq(33, 57), si_distr = Smallpox1972$si_distr))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day.## load data on smallpox in Kosovo in 1972 data("Smallpox1972") ## estimate the reproduction number (method "non_parametric_si") res <- estimate_R(Smallpox1972$incidence, method="non_parametric_si", config = make_config(list( t_start = seq(27, 51), t_end = seq(33, 57), si_distr = Smallpox1972$si_distr))) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the reproduction number ## over the 7-day window finishing on that day.
Estimate the case reproduction number of an epidemic, given the incidence
time series and the serial interval distribution. wallinga_teunis() is a
generic function with S3 methods for classes: integer, numeric,
data.frame, incidence, incidence2.
wallinga_teunis(incid, ...) ## Default S3 method: wallinga_teunis(incid, ...) ## S3 method for class 'numeric' wallinga_teunis( incid, method = c("non_parametric_si", "parametric_si"), config, ... ) ## S3 method for class 'integer' wallinga_teunis( incid, method = c("non_parametric_si", "parametric_si"), config, ... ) ## S3 method for class 'data.frame' wallinga_teunis( incid, method = c("non_parametric_si", "parametric_si"), config, count = "I", ... ) ## S3 method for class 'incidence' wallinga_teunis( incid, method = c("non_parametric_si", "parametric_si"), config, quiet = FALSE, ... ) ## S3 method for class 'incidence2' wallinga_teunis( incid, method = c("non_parametric_si", "parametric_si"), config, quiet = FALSE, ... )wallinga_teunis(incid, ...) ## Default S3 method: wallinga_teunis(incid, ...) ## S3 method for class 'numeric' wallinga_teunis( incid, method = c("non_parametric_si", "parametric_si"), config, ... ) ## S3 method for class 'integer' wallinga_teunis( incid, method = c("non_parametric_si", "parametric_si"), config, ... ) ## S3 method for class 'data.frame' wallinga_teunis( incid, method = c("non_parametric_si", "parametric_si"), config, count = "I", ... ) ## S3 method for class 'incidence' wallinga_teunis( incid, method = c("non_parametric_si", "parametric_si"), config, quiet = FALSE, ... ) ## S3 method for class 'incidence2' wallinga_teunis( incid, method = c("non_parametric_si", "parametric_si"), config, quiet = FALSE, ... )
incid |
An incidence time series provided as:
|
... |
further arguments to be passed to S3 methods. |
method |
the method used to estimate R, one of "non_parametric_si" or "parametric_si" |
config |
a list with the following elements:
|
count |
An |
quiet |
A |
Estimates of the case reproduction number for an epidemic over
predefined time windows can be obtained, for a given discrete distribution of
the serial interval, as described by Wallinga and Teunis (AJE, 2004).
Confidence intervals are obtained by simulating a number (config$n_sim) of
possible transmission trees (only done if config$n_sim > 0).
Note the method implemented here is as described in Wallinga and Teunis
(AJE, 2004), and in particular does not implement additional features such as
correcting for right censoring as proposed by Cauchemez et al. (AJE 2006).
The methods vary in the way the serial interval distribution is specified.
method = "non_parametric_si"The discrete distribution of the serial interval is directly specified in the
argument config$si_distr.
method = "parametric_si"The mean and standard deviation of the continuous distribution of the serial
interval are given in the arguments config$mean_si and
config$std_si. The discrete distribution of the serial interval is
derived automatically using discr_si().
a list with components:
R: a dataframe containing: the times of start and end of each time window
considered; the estimated mean, std, and 0.025 and 0.975 quantiles of the
reproduction number for each time window.
si_distr: a vector containing the discrete serial interval distribution
used for estimation
SI.Moments: a vector containing the mean and std of the discrete serial
interval distribution(s) used for estimation
I: the time series of total incidence
I_local: the time series of incidence of local cases (so that I_local + I_imported = I)
I_imported: the time series of incidence of imported cases (so that
I_local + I_imported = I)
dates: a vector of dates corresponding to the incidence time series
Anne Cori; generic and S3 methods by Thibaut Jombart
Cori, A. et al. A new framework and software to estimate time-varying reproduction numbers during epidemics (AJE 2013). Wallinga, J. and P. Teunis. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures (AJE 2004).
## load data on pandemic flu in a school in 2009 data("Flu2009") ## estimate the case reproduction number (method "non_parametric_si") res <- wallinga_teunis(Flu2009$incidence, method = "non_parametric_si", config = list(t_start = seq(2, 26), t_end = seq(8, 32), si_distr = Flu2009$si_distr, n_sim = 100, seed = 1) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the case reproduction number over the 7-day window ## finishing on that day. ## estimate the case reproduction number (method "parametric_si") res <- wallinga_teunis(Flu2009$incidence, method = "parametric_si", config = list(t_start = seq(2, 26), t_end = seq(8, 32), mean_si = 2.6, std_si = 1.5, n_sim = 100, seed = 1) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the case reproduction number over the 7-day window ## finishing on that day.## load data on pandemic flu in a school in 2009 data("Flu2009") ## estimate the case reproduction number (method "non_parametric_si") res <- wallinga_teunis(Flu2009$incidence, method = "non_parametric_si", config = list(t_start = seq(2, 26), t_end = seq(8, 32), si_distr = Flu2009$si_distr, n_sim = 100, seed = 1) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the case reproduction number over the 7-day window ## finishing on that day. ## estimate the case reproduction number (method "parametric_si") res <- wallinga_teunis(Flu2009$incidence, method = "parametric_si", config = list(t_start = seq(2, 26), t_end = seq(8, 32), mean_si = 2.6, std_si = 1.5, n_sim = 100, seed = 1) ) plot(res) ## the second plot produced shows, at each each day, ## the estimate of the case reproduction number over the 7-day window ## finishing on that day.
Please only use for compatibility;
Prefer the new wallinga_teunis() function instead
WT( I, T.Start, T.End, method = c("NonParametricSI", "ParametricSI"), Mean.SI = NULL, Std.SI = NULL, SI.Distr = NULL, nSim = 10, plot = FALSE, leg.pos = "topright" )WT( I, T.Start, T.End, method = c("NonParametricSI", "ParametricSI"), Mean.SI = NULL, Std.SI = NULL, SI.Distr = NULL, nSim = 10, plot = FALSE, leg.pos = "topright" )
I |
see |
T.Start |
see |
T.End |
see |
method |
see |
Mean.SI |
see |
Std.SI |
see |
SI.Distr |
see |
nSim |
see |
plot |
Not used anymore, only there for compatibility |
leg.pos |
Not used anymore, only there for compatibility |