context

The idea here is that we want to describe how to build a “context” and then evaluate one or more expressions in it. This is a little related to approaches like docker and packrat in that we want contexts to be isolated from one another, but different in that portability is more important than isolation.

Imagine that you have an analysis to run on another computer with:

  • packages to install from CRAN or any one of several other R package repositories (e.g., a drat, bioconductor, etc).
  • packages to install from GitHub
  • packages to install from local sources (e.g., private GitHub repos, unreleased code)
  • A number of source files to read in
  • A local environment to recreate (e.g., if calling a function from another function).

The other computer may already have some packages installed, so you don’t want to waste time and bandwidth re-installing them. So things end up littered with constructs like

if (!require("mypkg")) {
  install.packages("mypkg")
  library(mypkg)
}

If these packages are coming from GitHub (or worse also have dependencies on GitHub) the bootstrap code gets out of hand very quickly and tends to be non-portable.

Creating separate libraries (rather than sharing one from your personal computer) will be important if the architecture differs (e.g., you run Windows but you want to run code on a Linux cluster).

The idea here is that context helps describe a context made from the above ingredients and then attempts to recreate it on a different computer (or in a different directory on your computer).

Contexts

A minimal context looks like this:

path <- tempfile()
ctx <- context::context_save(path = path)
#> [ init:id   ]  d811cb115c5280e92c67af5308c8ca12
#> [ init:db   ]  rds
#> [ init:path ]  /tmp/RtmpLKe9lK/filebbe658f325f
#> [ save:id   ]  7435c5da4eeffc4a0e6fcc790d654e48
#> [ save:name ]  dishonest_meadowlark
ctx
#> <context>
#>  - packages: list(attached = character(0), loaded = character(0))
#>  - root_id: d811cb115c5280e92c67af5308c8ca12
#>  - id: 7435c5da4eeffc4a0e6fcc790d654e48
#>  - name: dishonest_meadowlark
#>  - root: list(id = "d811cb115c5280e92c67af5308c8ca12", path = "/tmp/RtmpLKe9lK/filebbe658f325f", db = <environment>)
#>  - db: <environment>

Typically one would use the arguments packages and sources to describe the requirements of any tasks that you’ll be running.

Tasks

Once a context is defined, tasks can be defined in the context. These are simply R expressions associated with the identifier of a context.

t <- context::task_save(quote(sin(1)), context = ctx)
t
#> [1] "bfcf6bc9261caf6dbe56059f4e7a674d"

The task t above is just a key that can be used to retrieve information about the task later.

context::task_expr(t, ctx)
#> sin(1)

Several such tasks may exist, though here only one does

context::task_list(ctx)
#> [1] "bfcf6bc9261caf6dbe56059f4e7a674d"

To run a task we first need to “load” the context (this will actual load any required packages and source any scripts) then pass this through to task_run

res <- context::task_run(t, context::context_load(ctx))
#> [ context   ]  7435c5da4eeffc4a0e6fcc790d654e48
#> [ library   ]
#> [ namespace ]
#> [ source    ]
#> [ root      ]  /tmp/RtmpLKe9lK/filebbe658f325f
#> [ context   ]  7435c5da4eeffc4a0e6fcc790d654e48
#> [ task      ]  bfcf6bc9261caf6dbe56059f4e7a674d
#> [ expr      ]  sin(1)
#> [ start     ]  2024-10-17 03:44:18.038484
#> [ ok        ]
#> [ end       ]  2024-10-17 03:44:18.041891

This prints the result of restoring the context and running the task:

  • context: the context id
  • library: calls to library() to load packages and attach namespaces
  • namespace: calls to loadNamespace(); these packages were present but not attached in the context.
  • source: There was nothing to source() here so this is blank, otherwise it would be a list of filenames.
  • root: the directory within which all our context/task files will be located
  • context: this is repeated here because we’ve finished the load part of the aove statement
  • task: the task id
  • expr: the expression to evaluate
  • start: start time
  • ok: indication of success
  • end: end time

After all that, here is the result:

res
#> [1] 0.841471

The result can also be retrieved using task_result():

context::task_result(t, ctx)
#> [1] 0.841471

This is not immensely useful as it is; it’s just evaluation with more steps. Typically we’d do this in another process. You can do this with callr here:

res <- callr::rscript(file.path(path, "bin", "task_run"), c(path, t),
                      echo = TRUE, show = TRUE)
#> Running /usr/lib/R/bin/Rscript /tmp/RtmpLKe9lK/filebbe658f325f/bin/task_run \
#>   /tmp/RtmpLKe9lK/filebbe658f325f bfcf6bc9261caf6dbe56059f4e7a674d
#> [ hello     ]  2024-10-17 03:44:18.268431
#> [ wd        ]  /tmp/RtmpKfQIAm/Rbuildb08729783a/context/vignettes
#> [ init      ]  2024-10-17 03:44:18.272296
#> [ hostname  ]  fcddbfc6481b
#> [ process   ]  3100
#> [ version   ]  0.5.0
#> [ open:db   ]  rds
#> [ context   ]  7435c5da4eeffc4a0e6fcc790d654e48
#> [ library   ]
#> [ namespace ]
#> [ source    ]
#> [ parallel  ]  running as single core job
#> [ root      ]  /tmp/RtmpLKe9lK/filebbe658f325f
#> [ context   ]  7435c5da4eeffc4a0e6fcc790d654e48
#> [ task      ]  bfcf6bc9261caf6dbe56059f4e7a674d
#> [ expr      ]  sin(1)
#> [ start     ]  2024-10-17 03:44:18.292833
#> [ ok        ]
#> [ end       ]  2024-10-17 03:44:18.295556