--- title: "Running in Parallel" author: "Bob Verity and Pete Winskill" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Running in Parallel} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, echo = FALSE} # set random seed set.seed(1) # load the drjacoby package library(drjacoby) ``` Running multiple chains is a good way of checking that our MCMC is working well. Each chain is completely independent of all others, and so this qualifies as an [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) problem. This vignette will demonstrate how to run *drjacoby* with multiple chains, first in serial and then in parallel over multiple cores. ## Setup As always, we require some data, some parameters, and some functions to work with (see [earlier examples](https://mrc-ide.github.io/drjacoby/articles/example.html)). The underlying model is not our focus here, so we will use a very basic setup ```{r} # define data data_list <- list(x = rnorm(10)) # define parameters dataframe df_params <- data.frame(name = "mu", min = -10, max = 10, init = 0) # define loglike function r_loglike <- function(params, data, misc) { sum(dnorm(data$x, mean = params["mu"], sd = 1.0, log = TRUE)) } # define logprior function r_logprior <- function(params, misc) { dunif(params["mu"], min = -10, max = 10, log = TRUE) } ``` ## Running multiple chains Whenever the input argument `cluster` is `NULL`, chains will run in serial. This is true by default, so running multiple chains in serial is simply a case of specifying the `chains` argument: ```{r} # run MCMC in serial mcmc <- run_mcmc(data = data_list, df_params = df_params, loglike = r_loglike, logprior = r_logprior, burnin = 1e3, samples = 1e3, chains = 2, pb_markdown = TRUE) ``` When we look at our MCMC output (using the `plot_trace()` function) we can see that there are 2 chains, each of which contains a series of draws from the posterior. If we used multiple [temperature rungs](https://mrc-ide.github.io/drjacoby/articles/metropolis_coupling.html) then these would also be duplicated over chains. ```{r, fig.width=10, fig.height=4} # summarise output mcmc # compare mu over both chains plot_trace(mcmc, "mu", phase = "both") ``` Running in parallel is only slightly more complex. Before running anything we need to know how many cores our machine has. You may know this number already, but if you don't then the `parallel` package has a handy function for detecting the number of cores for you: ```{r, eval = FALSE} cores <- parallel::detectCores() ``` Next we make a cluster object, which creates multiple copies of R running in parallel over different cores. Here we are using all available cores, but if you want to hold some back for other intensive tasks then simply use a smaller number of cores when specifying this cluster. ```{r, eval = FALSE} cl <- parallel::makeCluster(cores) ``` We then run the usual `run_mcmc()` function, this time passing in the cluster object as an argument. This causes *drjacoby* to use a `clusterApplyLB()` call rather than an ordinary `lapply()` call over different chains. Each chain is added to a queue over the specified number of cores - when the first job completes, the next job is placed on the node that has become available and this continues until all jobs are complete. Note that output is supressed when running in parallel to avoid sending print commands to multiple cores, so you will not see the usual progress bars. ```{r, eval = FALSE} # run MCMC in parallel mcmc <- run_mcmc(data = data_list, df_params = df_params, loglike = r_loglike, logprior = r_logprior, burnin = 1e3, samples = 1e3, chains = 2, cluster = cl, pb_markdown = TRUE) ``` Finally, it is good practice to shut down the workers once we are finished: ```{r, eval = FALSE} parallel::stopCluster(cl) ``` ## Running multiple chains using C++ log likelihood or log prior functions To run the MCMC in parallel with C++ log-likelihood or log-prior functions there is on additional step to take when setting up the cluster. Each node must be able to access a version of the compiled C++ functions, so after initialising the cluster with: ```{r, eval = FALSE} cl <- parallel::makeCluster(cores) ``` we must also make the C++ functions accessible, by running the Rcpp::sourceCPP command for our C++ file for each node: ```{r, eval = FALSE} cl_cpp <- parallel::clusterEvalQ(cl, Rcpp::sourceCpp("my_cpp.cpp")) ``` After which we can run the mcmc in the same way: ```{r, eval = FALSE} # run MCMC in parallel mcmc <- run_mcmc(data = data_list, df_params = df_params, loglike = "loglike", logprior = "logprior", burnin = 1e3, samples = 1e3, chains = 2, cluster = cl, pb_markdown = TRUE) parallel::stopCluster(cl) ```