You may not need to read
this: the intended readers are authors of orderly2
plugins,
not users of such plugins.
In order to make orderly2
more extensible without
bloating the core, we have designed a simple plugin interface. Our first
use case for this is shifting all of orderly1
’s database
functionality out of the main package, but other uses are possible!
This vignette is intended to primarily serve as a design document, and will be of interest to the small number of people who might want to write a new plugin, or to edit an existing one.
A plugin is provided by a package, possibly it will be the only thing
that a package provides. The plugin name must (currently) be the same as
the package name. The only functions that the package needs to call are
orderly2::orderly_plugin
and
orderly2::orderly_plugin_register
which create and register
the plugin, respectively.
To make a plugin available for an orderly project, two new bits of
configuration may be present in orderly_config.yml
- one
declares the plugin will be used, the other configures the plugin.
To use a plugin for an individual report, functions from the plugin should be used, which configure and use the plugin.
Finally, we can save information back into the final
orderly2
metadata about what the plugin did.
With the yaml-less design of orderly2
(see
vignette("migrating")
if you are familiar with
orderly1
), the line between a plugin and just package code
is fairly blurred, but reasons for writing a plugin are typically that
you want to make something easier in reports, and you want that action
reflected in the orderly metadata.
As an example, we’ll implement a stripped down version of the database plugin that inspired this work (see `orderly.db for a fuller implementation). To make this work we need functions:
orderly_config.yml
that describe where to find the databaseWe’ll start with the report side of things, describing what we want to happen, then work on the implementation.
Here is the directory structure of our minimal project
## .
## ├── orderly_config.yml
## └── src
## └── example
## └── example.R
The orderly_config.yml
file contains the information
shared by all possible uses of the plugin - in the case the connection
information for the database:
Our plugin is called example.db
and is listed within the
plugins
section, along with its configuration; in this case
indicating the path where the SQLite file can be loaded from.
The example.R
file contains information about use of the
database for this specific report; in this case, making the results of
the query SELECT * from mtcars WHERE cyl == 4
against the
database available as some R object dat
dat <- example.db::query("SELECT * FROM mtcars WHERE cyl == 4")
orderly2::orderly_artefact("Summary of data", "data.rds")
saveRDS(summary(dat), "data.rds")
Normally, we imagine some calculation here but this is kept minimal for the purpose of demonstration.
To implement this we need to:
orderly_config.yml
query()
used in example.R
to do the query itselfThere are lots of package skeleton tools out there, and if you do not
have a favourite, usethis::create_package()
will probably
do a reasonable job. The only thing your package needs to do is to
contain Imports: orderly2
in its DESCRIPTION
field.
A simple package may have a structure like
## .
## ├── DESCRIPTION
## ├── NAMESPACE
## └── R
## └── plugin.R
Here, our DESCRIPTION
file contains:
Package: example.db
Version: 0.0.1
License: CC0
Title: Orderly Database Example Plugin
Description: Simple example of an orderly plugin.
Authors@R: person('Orderly Authors', role = c('aut', 'cre'),
email = '[email protected]')
Imports: orderly2
and the NAMESPACE
and R/plugin.R
files are
shown below.
The only required function that a plugin needs to provide is one to
process the data from orderly_config.yml
. This is probably
primarily concerned with validation so can be fairly simple at first,
later we’ll expand this to report errors nicely:
The arguments here are
data
: the deserialised section of the
orderly_config.yml
specific to this pluginfilename
: the full path to
orderly_config.yml
The return value here should be the data
argument with
any auxiliary data added after validation.
Finally, for our minimal example, we need the function that actually
does the query; in our example above this is
example.db::query
:
query <- function(sql) {
ctx <- orderly2::orderly_plugin_context("example.db")
dbname <- ctx$config$path
con <- DBI::dbConnect(RSQLite::SQLite(), dbname)
on.exit(DBI::dbDisconnect(con))
DBI::dbGetQuery(con, sql)
}
The arguments here are whatever you want the user to provide –
nothing here is special to orderly2
. The important function
here to call is orderly2::orderly_plugin_context
which
returns information that you can use to make the plugin work. This is
explained in ?orderly2::orderly_plugin_context
, but in this
example we use just one element, config
, the configuration
for this plugin (i.e., the return value from our function
db_config
); see
orderly2::orderly_plugin_context
for other context that can
be accessed here.
The last bit of package code is to register the plugin, we do this by
calling orderly2::orderly_plugin_register
within
.onLoad()
which is a special R function called when a
package is loaded. This means that whenever your packages is loaded
(regardless of whether it is attached) it will register the plugin.
.onLoad <- function(...) {
orderly2::orderly_plugin_register(
name = "example.db",
config = db_config)
}
(It is important that the name
argument here matches
your package name, as orderly2 will trigger loading the package based on
this name in the configuration; we may support multiple plugins within
one package later.)
Note that our query
function here does not appear within
this registration, just the function to read and process the
configuration.
Our final (minimal) package code is:
db_config <- function(data, filename) {
data
}
query <- function(sql) {
ctx <- orderly2::orderly_plugin_context("example.db")
dbname <- ctx$config$path
con <- DBI::dbConnect(RSQLite::SQLite(), dbname)
on.exit(DBI::dbDisconnect(con))
DBI::dbGetQuery(con, sql)
}
.onLoad <- function(...) {
orderly2::orderly_plugin_register(
name = "example.db",
config = db_config)
}
and the NAMESPACE
file contains
export(query)
In order to test your package, it needs to be loaded. You can do this
by either installing the package or by using
pkgload::load_all()
(you may find doing so with
pkgload::load_all(export_all = FALSE)
gives the most
reliable experience.
## ℹ Loading example.db
Now, we can run the report:
orderly2::orderly_run("example", root = path_root)
## ℹ Starting packet 'example' `20241119-152430-af48de61` at 2024-11-19 15:24:30.689888
## > dat <- example.db::query("SELECT * FROM mtcars WHERE cyl == 4")
## > orderly2::orderly_artefact("Summary of data", "data.rds")
## Warning: Please use a named argument for the description in 'orderly_artefact()'
## In future versions of orderly, we will change the order of the arguments to
## 'orderly_artefact()' so that 'files' comes first. If you name your calls to
## 'description' then you will be compatible when we make this change.
## > saveRDS(summary(dat), "data.rds")
## ✔ Finished running 'example.R'
## ! 1 warning found:
## • Please use a named argument for the description in 'orderly_artefact()' In
## future versions of orderly, we will change the order of the arguments to
## 'orderly_artefact()' so that 'files' comes first. If you name your calls to
## 'description' then you will be compatible when we make this change.
## ℹ Finished 20241119-152430-af48de61 at 2024-11-19 15:24:30.747267 (0.05737805 secs)
## [1] "20241119-152430-af48de61"
The plugin above is fairly fragile because it does not do any
validation on the input data from orderly_config.yml
or
orderly.yml
. This is fairly annoying to do as yaml is
incredibly flexible and reporting back information to the user about
what might have gone wrong is hard.
In our case, we expect a single key-value pair in
orderly_config.yml
with the key being path
and
the value being the path to a SQLite database. We can easily expand our
configuration function to report better back to the user when they
misconfigure the plugin:
db_config <- function(data, filename) {
if (!is.list(data) || is.null(names(data)) || length(data) == 0) {
stop("Expected a named list for orderly_config.yml:example.db")
}
if (length(data$path) != 1 || !is.character(data$path)) {
stop("Expected a string for orderly_config.yml:example.db:path")
}
if (!file.exists(data$path)) {
stop(sprintf(
"The database '%s' does not exist (orderly_config:example.db:path)",
data$path))
}
data
}
This should do an acceptable job of preventing poor input while suggesting to the user where they might look within the configuration to fix it. Note that we return the configuration data here, and you can augment (or otherwise change) this data as you need.
Nothing about what the plugin does is saved into the report metadata unless you save it. Partly this is because the orderly.yml, which is saved into the final directory, serves as some sort of record. However, you probably want to know something about the data that you returned here. For example we might want to save
orderly.yml
fileTo save metadata, use the function
orderly2::orderly_plugin_add_metadata
; this takes as
arguments your plugin name, any string you like to structure the saved
metadata (here we’ll use query
) and whatever data you want
to save:
query <- function(sql) {
ctx <- orderly2::orderly_plugin_context("example.db")
dbname <- ctx$config$path
con <- DBI::dbConnect(RSQLite::SQLite(), dbname)
on.exit(DBI::dbDisconnect(con))
d <- DBI::dbGetQuery(con, sql)
info <- list(sql = sql, rows = nrow(d), cols = names(d))
orderly2::orderly_plugin_add_metadata("example.db", "query", info)
d
}
This function is otherwise the same as the minimal version above.
We also need to provide a serialisation function to ensure that the
metadata is saved as expected. Because we saved our metadata under the
key query
, we will get a list back with an element
query
and then an unnamed list with as many elements as
there were query
calls in a given report.
db_serialise <- function(data) {
for (i in seq_along(data$query)) {
# Always save cols as a vector, even if length 1:
data$query[[i]]$cols <- I(data$query[[i]]$cols)
}
jsonlite::toJSON(data$query, auto_unbox = TRUE)
}
Here, we ensure that everything except cols
that is
length 1 (which will be everything) gets turned into a scalar (so
1
not [1]
) and then serialise with
jsonlite::toJSON
with auto_unbox
as
TRUE
.
Taking this a step further, we can also specify a schema that this metadata will conform to
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "array",
"items": {
"type": "object",
"properties": {
"sql": {
"type": "string"
},
"rows": {
"type": "number"
},
"cols": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": ["sql", "rows", "cols"],
"additionalProperties": false
}
}
We save this file as inst/schema.json
within the package
(any path within inst
is fine).
Finally, we can also add a deserialiation hook to convert the loaded
metadata into a nice data.frame
:
Now, when we register the plugin, we provide the path to this schema, along with the serialisation and deserialisation functions:
.onLoad <- function(...) {
orderly2::orderly_plugin_register(
name = "example.db",
config = db_config,
serialise = db_serialise,
deserialise = db_deserialise,
schema = "schema.json")
}
Now, when the orderly metadata is saved (just before running the
script part of a report) we will validate output that was passed into
orderly2::orderly_plugin_add_metadata
against the schema,
if jsonvalidate
is installed (currently this requires our
development version) and if the R option
outpack.schema_validate
is set to TRUE
(e.g.,
by running options(outpack.schema_validate = TRUE)
).
Our final package has structure:
## .
## ├── archive
## │ └── example
## │ └── 20241119-152430-af48de61
## │ ├── data.rds
## │ └── example.R
## ├── draft
## │ └── example
## ├── orderly_config.yml
## └── src
## └── example
## └── example.R
The DESCRIPTION
file and NAMESPACE
are
unchanged from above, and the schema is shown just above.
The plugin.R
file contains the code collected from
above:
db_config <- function(data, filename) {
if (!is.list(data) || is.null(names(data)) || length(data) == 0) {
stop("Expected a named list for orderly_config.yml:example.db")
}
if (length(data$path) != 1 || !is.character(data$path)) {
stop("Expected a string for orderly_config.yml:example.db:path")
}
if (!file.exists(data$path)) {
stop(sprintf(
"The database '%s' does not exist (orderly_config:example.db:path)",
data$path))
}
data
}
query <- function(sql) {
ctx <- orderly2::orderly_plugin_context("example.db")
dbname <- ctx$config$path
con <- DBI::dbConnect(RSQLite::SQLite(), dbname)
on.exit(DBI::dbDisconnect(con))
d <- DBI::dbGetQuery(con, sql)
info <- list(sql = sql, rows = nrow(d), cols = names(d))
orderly2::orderly_plugin_add_metadata("example.db", "query", info)
d
}
.onLoad <- function(...) {
orderly2::orderly_plugin_register(
name = "example.db",
config = db_config,
serialise = db_serialise,
deserialise = db_deserialise,
schema = "schema.json")
}
(this code could be in any .R file in the package, or across several).
id <- orderly2::orderly_run("example", root = path_root)
## ℹ Starting packet 'example' `20241119-152431-1b623ee1` at 2024-11-19 15:24:31.112046
## > dat <- example.db::query("SELECT * FROM mtcars WHERE cyl == 4")
## > orderly2::orderly_artefact("Summary of data", "data.rds")
## Warning: Please use a named argument for the description in 'orderly_artefact()'
## In future versions of orderly, we will change the order of the arguments to
## 'orderly_artefact()' so that 'files' comes first. If you name your calls to
## 'description' then you will be compatible when we make this change.
## > saveRDS(summary(dat), "data.rds")
## ✔ Finished running 'example.R'
## ! 1 warning found:
## • Please use a named argument for the description in 'orderly_artefact()' In
## future versions of orderly, we will change the order of the arguments to
## 'orderly_artefact()' so that 'files' comes first. If you name your calls to
## 'description' then you will be compatible when we make this change.
## ℹ Finished 20241119-152431-1b623ee1 at 2024-11-19 15:24:31.157354 (0.04530787 secs)
meta <- orderly2::orderly_metadata(id, root = path_root)
meta$custom$example.db
## sql rows cols
## 1 SELECT * FROM mtcars WHERE cyl == 4 11 mpg, cyl....
Our need for this functionality are similar to this example - pulling out the database functionality from the original version of orderly into something that is more independent, as it turns out to be useful only in a fraction of orderly use-cases. We can imagine other potential uses though, such as:
orderly_config.yml
would
contain account connection details and orderly.yml
would
contain mapping between the remote data/files and local files. Rather
than writing to the environment as we do above, use the
path
argument to copy files into the correct place.These all follow the same basic pattern of requiring some configuration in order to be able to connect to the resource service, some specification of what resources are to be fetched, and some action to actually fetch the resource and put it into place.