Fundamentally, using a computer to
create a realisation from stochastic Monte Carlo models is extremely
simple. Consider a random walk in one dimension - we might write that in
base R functions by creating a function that takes a current state
state
and a list of parameters:
and then iterating it for 20 time steps with:
At the end of this process, the variable y
contains a
new value, corresponding to 20 time steps with our stochastic update
function.
So why does dust
apparently require thousands of lines
of code to do this?
It’s very rare that one might want to run a single stochastic simulation; normally we want to run a group together. There are several ways that we might want to do that:
The book-keeping for this can get tedious and error prone if done by
hand. In dust
, we try and restrict concern about this to a
few points, and for the simulation itself – the interaction that we
expect to take the longest in any interesting model – we just run a big
loop over time and all particles no matter what type of structure they
might represent from the above.
See vignette("multi")
for details of interacting with
different ways that you might want to structure your simulations.
Once we’re running multiple simulations at once, even a simple simulation might start taking a long time and because they are independent we might look to parallelism to try and speed up the simulations.
However, one cannot just draw multiple numbers from a single random number generator at once. That is, given a generator like those built into R, there is no parallel equivalent to
that would draw the 10 numbers in parallel rather than in series. When drawing a random number there is a “side effect” of updating the random number state. That is because the random number stream is also a Markov chain!
As such it makes sense (to us at least) to store the state of each
stream’s random number generator separately, so if we have
n
particles within a dust
object we have
n
separate streams, and we might think of the model state
as being the state that is declared by the user as a vector of floating
point numbers alongside the random number state. During each model step,
the model state is updated and so is the random number state.
This might seem wasteful, and if we used the popular Mersenne Twister it would be to some degree as each particle would require 2560 bytes of additional state. In contrast the newer xoshiro generators that we use require only 32 or 16 bytes of state; the same as 4 double- or single-precision floating point numbers respectively. So for any non-trivial simulation it’s not a very large overhead.
Setting the seed for these runs is not trivial, particularly as the number of simultaneous particles increases. If you’ve used random numbers with the future package you may have seen it raise a warning if you do not configure it to use a “L’Ecuyer-CMRG” which adapts R’s native random number seeds to be safe in parallel.
The reason for this is that if different streams start from seeds that are set via poor heuristics (e.g., system time and thread id) they might be exactly the same. If they were set randomly, then they might collide (see John Cook’s description of the birthday paradox here) and if they are picked sequentially there’s no guarantee that these streams might not be correlated.
Ideally we want a similar set of properties to R’s
set.seed
method; the user provides an arbitrary integer and
we seed all the random number streams using this in a way that
is reproducible and also statistically robust. We also want the streams
to be reproducible even when the number of particles changes, for
particle indices that are shared. The random number generators we use
(the xoshiro family, a.k.a. Blackmann-Vigna generators) support these
properties; they will be more fully described in monty
where they
are implemented.
To initialise our system with a potentially very large number of particles we take two steps:
splitmix64
RNG, following the xoshiro docs. This expands a single 64-bit integer
into the 256-bits of RNG state, while ensuring that the resulting full
random number seed does not contain all zeros.vignette("rng")
for
details).With this setup we are free to parallelise the system as each realisation is completely independent of all others; the problem has become “embarrassingly parallel”. In practice we do this using OpenMP where available as this is well supported from R and gracefully falls back on serial operation where not available.
As the number of threads changes, the results will not change; the same calculations will be carried out and the same random numbers drawn.
Sometimes we might parallelise beyond one computer (e.g., when using
a cluster), in which case we cannot use OpenMP. We call this case
“distributed parallelism” and cope by having each process take a “long
jump” (an even larger jump in the random number space), then within the
process proceed as above. This is the approach taken in our monty
package
for organising running MCMC chains in parallel, each of which works with
a dust model.
A general rule-of-thumb is to avoid unneeded memory allocations in tight loops; with this sort of stochastic iteration everything is a tight loop! However, we’ve reduced the problem scope to just providing an update method, and as long as that does not issue memory allocations then the whole thing runs in fixed space without having to worry.
For non-trivial systems, we often want to record a subset of states -
potentially a very small fraction of the total states computed. For
example, in our sircovid
model we track several thousand states (representing populations in
various stages of disease transmission, in different ages, with
different vaccination status etc), but most of the time we only need to
report on a few tens of these in order to fit to data or to examine key
outputs.
Reducing the number of state variables returned at different points in the process has several advantages:
To enable this, you can restrict the state returned by most methods; some by default and others when you call them.
dust_system_run_to_time()
returns no
state, simply advances the system forward in timedust_system_simulate()
method move the system
forwards in time and returns the state at a series of points. It accepts
an index_state
argument to limit which states are
returneddust_system_state()
method returns the model state
and accepts an argument index_state
as the state to
returnWe use the idea of “packers”,
from monty
to help work with the state.
The ordering of the state is important; we always have dimensions that will contain:
vignette("multi")
)simulate
or a particle
filterThis is to minimise repeatedly moving around data during writing, and
to help with concatenation. Multiple particles data is stored
consecutively and read and written in order. Each time step is written
at once. And you can append states from different times easily. The
base-R aperm()
function will be useful for reshaping this
output to a different dimension order if you require one, but it can be
very slow.
In order to pull all of this off, we allocate all our memory up front, in C++ and pass back to R a “pointer” to this memory, which will live for as long as your model object. This means that even if your model requires gigabytes of memory to run, it is never copied back and forth into R (where it would be subject to R’s copy-on-write semantics but instead accessed only when needed, and written to in-place following C++ reference semantics.
We try and provide verbs that are useful, given that the model presents a largely opaque pointer to model state.
Normally a system will contain several things:
uint32_t
or uint64_t
)double
)Given this, the sorts of verbs that we need include:
dust_system_run_to_time()
) - runs the system’s
update
method as many times as required to reach the new
time point, returning the model state at this time point. This is useful
where you might want to change the system at this time point, then
continue. For continuous time models we use an ODE solver using the
system’s deriv
method to compute rates.dust_system_simulate()
) - as for
dust_system_run_to_time()
but also collects partial state
at a number of times along the way. This always has one more dimension
than dust_system_state()
(being time).dust_system_set_state()
) - leaves
RNG state and parameters untouched but replaces model state for all
particles. This is useful for model initialisation and for performing
arbitrary model state and/or parameter changes.In addition, we have more specific methods oriented towards particle filtering:
dust_system_reorder()
) -
shuffles particle state among particles within a parameter set. This is
useful for implementing resampling algorithms and updates only the state
(as for dust_system_set_state()
, leaving RNG state and
internal state untouched)dust_filter_create()
)
which is implemented using the above methods, in the case where the
model provides a compare function. This provides an interface with
monty
when used with
dust_likelihood_monty()
The most esoteric design of dust is to make it convenient to use as a
target for other programs. We use the package primarily as a target for
models written in odin2
. This
allows the user to write models at a very high level, describing the
updates between steps. The random walk example at the beginning of this
document might be implemented as
sd <- parameter() # user-provided standard deviation
initial(y) <- 0 # starting point of the simulation
update(y) <- Normal(y, sd) # take random step each time step
We have designed these two systems to play well together so the user
can write models at a very high level and generate code that then works
well within this framework and efficiently runs in parallel. In sircovid
this is used in a model with hundreds of logical compartments each of
which may be structured, but the interface at the R level remains the
same as for the toy models used in the documentation here.