Title: | Orderly Next Generation |
---|---|
Description: | Distributed reproducible computing framework, adopting ideas from git, docker and other software. By defining a lightweight interface around the inputs and outputs of an analysis, a lot of the repetitive work for reproducible research can be automated. We define a simple format for organising and describing work that facilitates collaborative reproducible research and acknowledges that all analyses are run multiple times over their lifespans. |
Authors: | Rich FitzJohn [aut, cre], Robert Ashton [aut], Martin Eden [aut], Alex Hill [aut], Wes Hinsley [aut], Mantra Kusumgar [aut], Paul Liétar [aut], James Thompson [aut], Imperial College of Science, Technology and Medicine [cph] |
Maintainer: | Rich FitzJohn <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.99.59 |
Built: | 2024-12-13 11:20:40 UTC |
Source: | https://github.com/mrc-ide/orderly2 |
Declare an artefact. By doing this you turn on a number of orderly features; see Details below. You can have multiple calls to this function within your orderly script.
orderly_artefact(description = NULL, files)
orderly_artefact(description = NULL, files)
description |
The name of the artefact |
files |
The files within this artefact |
(1) files matching this will not be copied over from the src
directory to the draft directory unless they are also listed as a
resource with orderly_resource()
. This feature is only enabled
if you call this function from the top level of the orderly script
and if it contains only string literals (no variables).
(2) if your script fails to produce these files, then
orderly_run()
will fail, guaranteeing that your task does really
produce the things you need it to.
(3) within the final metadata, your artefacts will have additional metadata; the description that you provide and a grouping
Undefined
Find, and delete, file that were generated by running a report.
Until you're comfortable with what this will do, you are strongly
recommended to run orderly_cleanup_status
first to see what will
be deleted.
orderly_cleanup(name = NULL, dry_run = FALSE, root = NULL) orderly_cleanup_status(name = NULL, root = NULL)
orderly_cleanup(name = NULL, dry_run = FALSE, root = NULL) orderly_cleanup_status(name = NULL, root = NULL)
name |
Name of the report directory to clean (i.e., we look
at |
dry_run |
Logical, indicating if we should not delete anything, but instead just print information about what we would do |
root |
The path to the root directory, or |
After file deletion, we look through and remove all empty directories; orderly2 has similar semantics here to git where directories are never directly tracked.
For recent gert
we will ask git if files are ignored; if ignored
then they are good candidates for deletion! We encourage you to
keep a per-report .gitignore
that lists files that will copy
into the source directory, and then we can use that same
information to clean up these files after generation.
Importantly, even if a file matches an ignore rule but has been
committed to your repository, it will no longer match the ignore
rule.
An (currently unstable) object of class
orderly_cleanup_status
within which the element delete
indicates files that would be deleted (for
orderly_cleanup_status
) or that were deleted (for
orderly_cleanup
)
In orderly1 this function has quite different semantics, because the full set of possible files is always knowable from the yaml file. So there, we start from the point of view of the list of files then compare that with the directory.
# Create a simple example: path <- orderly2::orderly_example("default") # We simulate running a packet interactively by using 'source'; # you might have run this line-by-line, or with the "Source" # button in Rstudio. source(file.path(path, "src/data/data.R"), chdir = TRUE) # Having run this, the output of the report is present in the # source directory: fs::dir_tree(path) # We can detect what might want cleaning up by running # "orderly_cleanup_status": orderly2::orderly_cleanup_status("data", root = path) # Soon this will print more nicely to the screen, but for now you # can see that the status of "data.rds" is "derived", which means # that orderly knows that it is subject to being cleaned up; the # "delete" element shows what will be deleted. # Do the actual deletion: orderly2::orderly_cleanup("data", root = path) fs::dir_delete(path)
# Create a simple example: path <- orderly2::orderly_example("default") # We simulate running a packet interactively by using 'source'; # you might have run this line-by-line, or with the "Source" # button in Rstudio. source(file.path(path, "src/data/data.R"), chdir = TRUE) # Having run this, the output of the report is present in the # source directory: fs::dir_tree(path) # We can detect what might want cleaning up by running # "orderly_cleanup_status": orderly2::orderly_cleanup_status("data", root = path) # Soon this will print more nicely to the screen, but for now you # can see that the status of "data.rds" is "derived", which means # that orderly knows that it is subject to being cleaned up; the # "delete" element shows what will be deleted. # Do the actual deletion: orderly2::orderly_cleanup("data", root = path) fs::dir_delete(path)
Insignificant differences in the metadata (eg. different dates and packet IDs) are excluded from the comparison.
orderly_compare_packets( target, current, location = NULL, allow_remote = NULL, fetch_metadata = FALSE, root = NULL )
orderly_compare_packets( target, current, location = NULL, allow_remote = NULL, fetch_metadata = FALSE, root = NULL )
target |
The id of the packet to use in the comparison. |
current |
The id of the other packet against which to compare. |
location |
Optional vector of locations to pull from. We might in future expand this to allow wildcards or exceptions. |
allow_remote |
Logical, indicating if we should allow packets
to be found that are not currently unpacked (i.e., are known
only to a location that we have metadata from). If this is
|
fetch_metadata |
Logical, indicating if we should pull
metadata immediately before the search. If |
root |
The path to the root directory, or |
An object of class orderly_comparison. The object can be printed to get a summary description of the differences, or passed to orderly_comparison_explain to display more details.
This function allows to select what part of the packet to compare, and in how much details.
orderly_comparison_explain(cmp, attributes = NULL, verbose = FALSE)
orderly_comparison_explain(cmp, attributes = NULL, verbose = FALSE)
cmp |
An orderly_comparison object, as returned by orderly_compare_packets. |
attributes |
A character vector of attributes to include in the
comparison. The values are keys of the packets' metadata, such as
|
verbose |
Control over how much information is printed. It can either
be a logical, or a character scalar |
Invisibly, a logical indicating whether the packets are equivalent, up to the given attributes.
Read the current orderly configuration, stored within the outpack root, along with any orderly-specific extensions.
orderly_config(root = NULL)
orderly_config(root = NULL)
root |
The path to the root directory, or |
A list of configuration options:
core
: The most important options about the outpack store, containing:
path_archive
: The path to the human-readable packet archive,
or NULL
if disabled (set in orderly_config_set as
core.path_archive
)
use_file_store
: Indicates if a content-addressable file store
is enabled (core.use_file_store
)
require_complete_tree
: Indicates if this outpack store requires
all dependencies to be fully available (core.require_complete_tree
)
hash_algorithm
: The hash algorithm used (currently not modifiable)
location
: Information about locations; see
orderly_location_add,
orderly_location_rename and
orderly_location_remove to interact with this
configuration, or orderly_location_list to more
simply list available locations. Returns as a data.frame with
columns name
, id
, priority
, type
and args
, with args
being a list column.
orderly
: A list of orderly-specific configuration; this is
just the minimum required version (as
minimum_orderly_version
).
# A default configuration in a new temporary directory path <- withr::local_tempdir() orderly2::orderly_init(path) orderly2::orderly_config(path)
# A default configuration in a new temporary directory path <- withr::local_tempdir() orderly2::orderly_init(path) orderly2::orderly_config(path)
Set configuration options. Not all can currently be set; this will be expanded over time. See Details.
orderly_config_set(..., options = list(...), root = NULL)
orderly_config_set(..., options = list(...), root = NULL)
... |
Named options to set (e.g., pass the argument
|
options |
As an alternative to |
root |
The path to the root directory, or |
Options are set in the order that they are provided. Currently, if setting one option fails, no further options will be processed but previous ones will be (do not rely on this behaviour, it may change).
Currently you can set:
core.require_complete_tree
See orderly_init for description of these options.
Nothing
orderly_config
# The default configuration does not include a file store, and # saves output within the "archive" directory: path <- withr::local_tempdir() orderly2::orderly_init(path) fs::dir_tree(path, all = TRUE) # Change this after the fact: orderly2::orderly_config_set(core.use_file_store = TRUE, core.path_archive = NULL, root = path) fs::dir_tree(path, all = TRUE)
# The default configuration does not include a file store, and # saves output within the "archive" directory: path <- withr::local_tempdir() orderly2::orderly_init(path) fs::dir_tree(path, all = TRUE) # Change this after the fact: orderly2::orderly_config_set(core.use_file_store = TRUE, core.path_archive = NULL, root = path) fs::dir_tree(path, all = TRUE)
Copy files from a packet to anywhere. Similar to orderly_dependency except that this is not used in an active packet context. You can use this function to pull files from an outpack root to a directory outside of the control of outpack, for example. Note that all arguments need must be provided by name, not position, with the exception of the id or query.
orderly_copy_files( expr, files, dest, overwrite = TRUE, name = NULL, location = NULL, allow_remote = NULL, fetch_metadata = FALSE, parameters = NULL, options = NULL, envir = parent.frame(), root = NULL )
orderly_copy_files( expr, files, dest, overwrite = TRUE, name = NULL, location = NULL, allow_remote = NULL, fetch_metadata = FALSE, parameters = NULL, options = NULL, envir = parent.frame(), root = NULL )
expr |
The query expression. A |
files |
Files to copy from the other packet. This can be (1)
a character vector, in which case files are copied over without
changing their names, (2) a named character vector, in which
case the name will be used as the destination name, or (3) a
data.frame (including In all cases, if you want to import a directory of files from a
packet, you must refer to the source with a trailing slash
(e.g., You can use a limited form of string interpolation in the names of
this argument; using Note that there is an unfortunate, but (to us) avoidable
inconsistency here; interpolation of values from your
environment in the query is done by using |
dest |
The directory to copy into |
overwrite |
Overwrite files at the destination; this is
typically what you want, but set to |
name |
Optionally, the name of the packet to scope the query on. This
will be intersected with |
location |
Optional vector of locations to pull from. We might in future expand this to allow wildcards or exceptions. |
allow_remote |
Logical, indicating if we should allow packets
to be found that are not currently unpacked (i.e., are known
only to a location that we have metadata from). If this is
|
fetch_metadata |
Logical, indicating if we should pull
metadata immediately before the search. If |
parameters |
Optionally, a named list of parameters to substitute
into the query (using the |
options |
DEPRECATED. Please don't use this any more, and
instead use the arguments |
envir |
Optionally, an environment to substitute into the
query (using the |
root |
The path to the root directory, or |
You can call this function with an id as a string, in which case
we do not search for the packet and proceed regardless of whether
or not this id is present. If called with any other arguments
(e.g., a string that does not match the id format, or a named
argument name
, subquery
or parameters
) then we interpret the
arguments as a query and orderly_search to find the
id. It is an error if this query does not return exactly one
packet id, so you probably want to use latest()
.
There are different ways that this might fail (or recover from failure):
if id
is not known in the metadata store (not known because
it's not unpacked but also not known to be present in some other
remote) then this will fail because it's impossible to resolve
the files. Consider refreshing the metadata with
orderly_location_fetch_metadata to refresh this.
if the id
is not unpacked and no local copy of the files
referred to can be found, we error by default (but see the next
option). However, sometimes the file you refer to might also be
present because you have downloaded a packet that depended on
it, or because the content of the file is unchanged because from
some other packet version you have locally.
if the id
is not unpacked, there is no local copy of the file
and if allow_remote
is TRUE
we will try and request the file
from whatever remote would be selected by
orderly_location_pull for this packet.
Note that empty directories might be created on failure.
Nothing, invisibly. Primarily called for its side effect
of copying files from a packet into the directory dest
Declare a dependency on another packet
orderly_dependency(name, query, files)
orderly_dependency(name, query, files)
name |
The name of the packet to depend on |
query |
The query to search for; often this will simply be
the string |
files |
Files to copy from the other packet. This can be (1)
a character vector, in which case files are copied over without
changing their names, (2) a named character vector, in which
case the name will be used as the destination name, or (3) a
data.frame (including In all cases, if you want to import a directory of files from a
packet, you must refer to the source with a trailing slash
(e.g., You can use a limited form of string interpolation in the names of
this argument; using Note that there is an unfortunate, but (to us) avoidable
inconsistency here; interpolation of values from your
environment in the query is done by using |
See orderly_run for some details about how search
options are used to select which locations packets are found from,
and if any data is fetched over the network. If you are running
interactively, this will obviously not work, so you should use
orderly_interactive_set_search_options()
to set the
options that this function will respond to.
Undefined
Describe the current packet
orderly_description(display = NULL, long = NULL, custom = NULL)
orderly_description(display = NULL, long = NULL, custom = NULL)
display |
A friendly name for the report; this will be displayed in some locations of the web interface, packit. If given, it must be a scalar character. |
long |
A longer description of the report. If given, it must be a scalar character. |
custom |
Any additional metadata. If given, it must be a named list, with all elements being scalar atomics (character, number, logical). |
Undefined
Copy a simple orderly example for use in the docs. This function should not form part of your workflow!
orderly_example(name, ..., dest = NULL)
orderly_example(name, ..., dest = NULL)
name |
The name of the example to copy. Currently only "default" is supported. |
... |
Arguments passed through to |
dest |
The destination. By default we use
|
Invisibly, the path to the example.
path <- orderly2::orderly_example("default") orderly2::orderly_list_src(root = path) fs::dir_delete(path)
path <- orderly2::orderly_example("default") orderly2::orderly_list_src(root = path) fs::dir_delete(path)
Update a gitignore, which is useful to prevent accidentally
committing files to source control that are generated. This
includes artefacts, shared resources and dependencies (within a
report directory) or at the global level all the contents of the
.outpack
directory, the draft folder and the archive directory.
orderly_gitignore_update(name, root = NULL)
orderly_gitignore_update(name, root = NULL)
name |
The name of the gitignore file to update, or the string "(root)" |
root |
The path to the root directory, or |
If this function fails with a message Can't edit '.gitignore', markers are corrupted
, then look for the special markers within
the .gitignore
file. It should look like
# ---VVV--- added by orderly ---VVV---------------- # Don't manually edit content between these markers ... patterns # ---^^^--- added by orderly ---^^^----------------
We can't edit the file if:
any of these lines appears more than once in the file
there is anything between the first two lines
they are not in this order
If you get the error message, search and remove these lines and rerun.
Nothing, called for its side effects
Use orderly2's hashing functions. This is intended for advanced
users, in particular those who want to create hashes that are
consistent with orderly2 from within plugins. The default
behaviour is to use the same algorithm as used in the orderly root
(via the root
argument, and the usual root location
approach). However, if a string is provided for algorithm
you
can use an alternative algorithm.
orderly_hash_file(path, algorithm = NULL, root = NULL) orderly_hash_data(data, algorithm = NULL, root = NULL)
orderly_hash_file(path, algorithm = NULL, root = NULL) orderly_hash_data(data, algorithm = NULL, root = NULL)
path |
The name of the file to hash |
algorithm |
The name of the algorithm to use, overriding that in the orderly root. |
root |
The path to the root directory, or |
data |
A string to hash |
A string in the format <algorithm>:<digest>
orderly2::orderly_hash_data("hello", "md5")
orderly2::orderly_hash_data("hello", "md5")
Initialise an empty orderly repository, or initialise a source
copy of an orderly repository (see Details). An orderly repository
is defined by the presence of a file orderly_config.yml
at its
root, along with a directory .outpack/
at the same level.
orderly_init( root = ".", path_archive = "archive", use_file_store = FALSE, require_complete_tree = FALSE, force = FALSE )
orderly_init( root = ".", path_archive = "archive", use_file_store = FALSE, require_complete_tree = FALSE, force = FALSE )
root |
The path to initialise the repository root at. If the repository is already initialised, this operation checks that the options passed in are the same as those set in the repository (erroring if not), but otherwise does nothing. The default path is the current working directory. |
path_archive |
Path to the archive directory, used to store
human-readable copies of packets. If |
use_file_store |
Logical, indicating if we should use a
content-addressable file-store as the source of truth for
packets. If |
require_complete_tree |
Logical, indicating if we require a
complete tree of packets. This currently affects
orderly_location_pull, by requiring that it
always operates in recursive mode. This is |
force |
Logical, indicating if we shold initialise orderly even if the directory is not empty. |
It is expected that orderly_config.yml
will be saved in version
control, but that .outpack
will be excluded from version
control; this means that for every clone of your project you will
need to call orderly2::orderly_init()
to initialise the
.outpack
directory. If you forget to do this, an error will be
thrown reminding you of what you need to do.
You can safely call orderly2::orderly_init()
on an
already-initialised directory, however, any arguments passed
through must exactly match the configuration of the current root,
otherwise an error will be thrown. Please use
orderly_config_set to change the configuration, as
this ensures that the change in configuration is possible. If
configuration options are given but match those that the directory
already uses, then nothing happens.
If the repository that you call orderly2::orderly_init()
on is
already initialised with an .outpack
directory but not an
orderly_config.yml
file, then we will write that file too.
The full, normalised, path to the root, invisibly. Typically this is called only for its side effect.
# We'll use an automatically cleaned-up directory for the root: path <- withr::local_tempdir() # Initialise a new repository, setting an option: orderly2::orderly_init(path, use_file_store = TRUE) # fs::dir_tree(path, all = TRUE)
# We'll use an automatically cleaned-up directory for the root: path <- withr::local_tempdir() # Initialise a new repository, setting an option: orderly2::orderly_init(path, use_file_store = TRUE) # fs::dir_tree(path, all = TRUE)
Set search options for interactive use of orderly; see
orderly_dependency and orderly_run for
details. This may be either an orderly_search_options
object, or a list that will be coerced into one at the point of
use (or NULL
). This applies only for the current session, but
applies to all interactive uses of orderly functions that might
have received a copy of search_options
via
orderly_run
orderly_interactive_set_search_options( location = NULL, allow_remote = NULL, fetch_metadata = FALSE )
orderly_interactive_set_search_options( location = NULL, allow_remote = NULL, fetch_metadata = FALSE )
location |
Optional vector of locations to pull from. We might in future expand this to allow wildcards or exceptions. |
allow_remote |
Logical, indicating if we should allow packets
to be found that are not currently unpacked (i.e., are known
only to a location that we have metadata from). If this is
|
fetch_metadata |
Logical, indicating if we should pull
metadata immediately before the search. If |
Nothing, called for its side effects
List source reports - that is, directories within src/
that
look suitable for running with orderly; these will be directories
that contain an entrypoint file - a .R
file with the same name
as the directory (e.g., src/data/data.R
corresponds to data
).
orderly_list_src(root = NULL)
orderly_list_src(root = NULL)
root |
The path to the root directory, or |
A character vector of names of source reports, suitable for passing to orderly_run
orderly_metadata_extract for listing packets that have completed
path <- orderly2::orderly_example("default") orderly2::orderly_list_src(root = path) fs::dir_delete(path)
path <- orderly2::orderly_example("default") orderly2::orderly_list_src(root = path) fs::dir_delete(path)
Add a new location - a place where other packets might be found
and pulled into your local archive. Currently only file and http
based locations are supported, with limited support for custom
locations. Note that adding a location does not pull metadata
from it, you need to call
orderly_location_fetch_metadata first. The function
orderly_location_add
can add any sort of location, but the other
functions documented here (orderly_location_add_path
, etc) will
typically be much easier to use in practice.
orderly_location_add(name, type, args, verify = TRUE, root = NULL) orderly_location_add_path(name, path, verify = TRUE, root = NULL) orderly_location_add_http(name, url, verify = TRUE, root = NULL) orderly_location_add_packit( name, url, token = NULL, save_token = NULL, verify = TRUE, root = NULL )
orderly_location_add(name, type, args, verify = TRUE, root = NULL) orderly_location_add_path(name, path, verify = TRUE, root = NULL) orderly_location_add_http(name, url, verify = TRUE, root = NULL) orderly_location_add_packit( name, url, token = NULL, save_token = NULL, verify = TRUE, root = NULL )
name |
The short name of the location to use. Cannot be in
use, and cannot be one of |
type |
The type of location to add. Currently supported
values are |
args |
Arguments to the location driver. The arguments here will vary depending on the type used, see Details. |
verify |
Logical, indicating if we should verify that the location can be used before adding. |
root |
The path to the root directory, or |
path |
The path to the other archive root. This can be a
relative or absolute path, with different tradeoffs. If you use
an absolute path, then this location will typically work well on
this machine, but it may behave poorly when the location is
found on a shared drive and when you use your orderly root
from more than one system. This setup is common when using an
HPC system. If you use a relative path, then we will interpret
it relative to your orderly root and not the directory that
you evaluate this command from. Typically your path should
include leading dots (e.g. |
url |
The location of the server, including protocol, for
example |
token |
The value for your your login token (currently this
is a GitHub token with |
save_token |
If no token is provided and interactive
authentication is used, this controls whether the GitHub token
should be saved to disk. Defaults to |
We currently support three types of locations - path
, which points
to an outpack archive accessible by path (e.g., on the same
computer or on a mounted network share), http
, which requires
that an outpack server is running at some url and uses an HTTP API
to communicate, and packit
, which uses Packit as a web
server. More types may be added later, and more configuration
options to these location types will definitely be needed in
future.
Configuration options for different location types are described in the arguments to their higher-level functions.
Path locations:
Use orderly_location_add_path
, which accepts a path
argument.
HTTP locations:
Accessing outpack over HTTP requires that an outpack server is running. The interface here is expected to change as we expand the API, but also as we move to support things like TLS and authentication.
Use orderly_location_add_http
, which accepts a url
argument.
Packit locations:
Packit locations work over HTTPS, and include everything in an outpack location but also provide authentication and later will have more capabilities we think.
Use orderly_location_add_packit
, which accepts url
, token
and save_token
arguments.
Custom locations:
All outpack implementations are expected to support path and http
locations, with the standard arguments above. But we expect that
some implementations will support custom locations, and that the
argument lists for these may vary between implementations. To
allow this, you can pass a location of type "custom" with a list
of arguments. We expect an argument 'driver' to be present among
this list. For an example of this in action, see the
outpack.sharepoint
package.
Be warned that we may change this interface in future, in which case you may need to update your configuration.
Nothing
The API here may change as we move to support different types of locations.
Fetch metadata from a location, updating the index. This should always be relatively quick as it updates only small files that contain information about what can be found in remote packets.
orderly_location_fetch_metadata(location = NULL, root = NULL)
orderly_location_fetch_metadata(location = NULL, root = NULL)
location |
The name of a location to pull from (see orderly_location_list for possible values). If not given, pulls from all locations. The "local" and "orphan" locations are always up to date and pulling metadata from them does nothing. |
root |
The path to the root directory, or |
Nothing
List known locations. The special name local
will always be
present within the output from this function (this is packets
known at the current root), though you will typically be
interested in other locations.
orderly_location_list(verbose = FALSE, root = NULL)
orderly_location_list(verbose = FALSE, root = NULL)
verbose |
Logical, indicating if we should return a data.frame that includes more information about the location. |
root |
The path to the root directory, or |
Depending on the value of verbose
:
verbose = FALSE
: A character vector of location names. This is the
default behaviour.
verbose = TRUE
: A data.frame with columns name
, type
and
args
. The args
column is a list column, with each element
being the key-value pair arguments to the location.
orderly_location_fetch_metadata, which can update your outpack index with metadata from any of the locations listed here.
Pull one or more packets (including all their files) into this archive from one or more of your locations. This will make files available for use as dependencies (e.g., with orderly_dependency).
orderly_location_pull( expr, name = NULL, location = NULL, fetch_metadata = FALSE, recursive = NULL, options = NULL, root = NULL )
orderly_location_pull( expr, name = NULL, location = NULL, fetch_metadata = FALSE, recursive = NULL, options = NULL, root = NULL )
expr |
The query expression. A |
name |
Optionally, the name of the packet to scope the query on. This
will be intersected with |
location |
Optional vector of locations to pull from. We might in future expand this to allow wildcards or exceptions. |
fetch_metadata |
Logical, indicating if we should pull
metadata immediately before the search. If |
recursive |
If non-NULL, a logical, indicating if we should
recursively pull all packets that are referenced by the packets
specified in |
options |
DEPRECATED. Please don't use this any more, and
instead use the arguments |
root |
The path to the root directory, or |
It is possible that it will take a long time to pull packets, if you are moving a lot of data or if you are operating over a slow connection. Cancelling and resuming a pull should be fairly efficient, as we keep track of files that are copied over even in the case of an interrupted pull.
Invisibly, the ids of packets that were pulled
Push tree to location. This function works out what packets are not known at the location and then what files are required to create them. It then pushes all the files required to build all packets and then pushes the missing metadata to the server. If the process is interrupted it is safe to resume and will only transfer files and packets that were missed on a previous call.
orderly_location_push( expr, location, name = NULL, dry_run = FALSE, root = NULL )
orderly_location_push( expr, location, name = NULL, dry_run = FALSE, root = NULL )
expr |
An expression to search for. Often this will be a vector of ids, but you can use a query here. |
location |
The name of a location to push to (see orderly_location_list for possible values). |
name |
Optionally, the name of the packet to scope the query on. This
will be intersected with |
dry_run |
Logical, indicating if we should print a summary but not make any changes. |
root |
The path to the root directory, or |
Invisibly, details on the information that was actually moved (which might be more or less than what was requested, depending on the dependencies of packets and what was already known on the other location).
Remove an existing location. Any packets from this location and not known elsewhere will now be associated with the 'orphan' location instead.
orderly_location_remove(name, root = NULL)
orderly_location_remove(name, root = NULL)
name |
The short name of the location.
Cannot remove |
root |
The path to the root directory, or |
Nothing
Rename an existing location
orderly_location_rename(old, new, root = NULL)
orderly_location_rename(old, new, root = NULL)
old |
The current short name of the location.
Cannot rename |
new |
The desired short name of the location.
Cannot be one of |
root |
The path to the root directory, or |
Nothing
Read metadata for a particular id. You may want to use orderly_search to find an id corresponding to a particular query.
orderly_metadata(id, root = NULL)
orderly_metadata(id, root = NULL)
id |
The id to fetch metadata for. An error will be thrown if this id is not known |
root |
The path to the root directory, or |
A list of metadata. See the outpack schema for details (https://github.com/mrc-ide/outpack)
Extract metadata from a group of packets. This is an experimental high-level function for interacting with the metadata in a way that we hope will be useful. We'll expand this a bit as time goes on, based on feedback we get so let us know what you think. See Details for how to use this.
orderly_metadata_extract( expr = NULL, name = NULL, location = NULL, allow_remote = NULL, fetch_metadata = FALSE, extract = NULL, options = NULL, root = NULL )
orderly_metadata_extract( expr = NULL, name = NULL, location = NULL, allow_remote = NULL, fetch_metadata = FALSE, extract = NULL, options = NULL, root = NULL )
expr |
The query expression. A |
name |
Optionally, the name of the packet to scope the query on. This
will be intersected with |
location |
Optional vector of locations to pull from. We might in future expand this to allow wildcards or exceptions. |
allow_remote |
Logical, indicating if we should allow packets
to be found that are not currently unpacked (i.e., are known
only to a location that we have metadata from). If this is
|
fetch_metadata |
Logical, indicating if we should pull
metadata immediately before the search. If |
extract |
A character vector of columns to extract, possibly named. See Details for the format. |
options |
DEPRECATED. Please don't use this any more, and
instead use the arguments |
root |
The path to the root directory, or |
Extracting data from outpack metadata is challenging to do in a way that works in data structures familiar to R users, because it is naturally tree structured, and because not all metadata may be present in all packets (e.g., a packet that does not depend on another will not have a dependency section, and one that was run in a context without git will not have git metadata). If you just want the raw tree-structured data, you can always use orderly_metadata to load the full metadata for any packet (even one that is not currently available on your computer, just known about it) and the structure of the data will remain fairly constant across orderly2 versions.
However, sometimes we want to extract data in order to ask specific questions like:
what parameter combinations are available across a range of packets?
when were a particular set of packets used?
what files did these packets produce?
Later we'd like to ask even more complex questions like:
at what version did the file graph.png
change?
what inputs changed between these versions?
...but being able to answer these questions requires a similar approach to interrogating metadata across a range of packets.
The orderly_metadata_extract
function aims to simplify the
process of pulling out bits of metadata and arranging it into a
data.frame
(of sorts) for you. It has a little mini-language in
the extract
argument for doing some simple rewriting of results,
but you can always do this yourself.
In order to use function you need to know what metadata are available; we will expand the vignette with more worked examples here to make this easier to understand. The function works on top-level keys, of which there are:
id: the packet id (this is always returned)
name: the packet name
parameters: a key-value pair of values, with string keys and atomic values. There is no guarantee about presence of keys between packets, or their types.
time: a key-value pair of times, with string keys and time
values (see DateTimeClasses; these are stored as seconds since
1970 in the actual metadata). At present start
and end
are
always present.
files: files present in each packet. This is a data.frame
(per
packet), each with columns path
(relative), size
(in bytes)
and hash
.
depends: dependencies used each packet. This is a data.frame
(per packet), each with columns packet
(id), query
(string,
used to find packet
) and files
(another data.frame
with
columns there
and here
corresponding to filenames upstream
and in this packet, respectively)
git: either metadata about the state of git or null
. If given
then sha
and branch
are strings, while url
is an array of
strings/character vector (can have zero, one or more elements).
session: some information about the session that the packet was run in (this is unstandardised, and even the orderly version may change)
custom: additional metadata added by its respective engine. For
packets run by orderly2
, there will be a orderly
field here,
which is itself a list:
artefacts: A data.frame with artefact information, containing
columns description
(a string) and paths
(a list column of paths).
shared: A data.frame of the copied shared resources with
their original name (there
) and name as copied into the packet
(here
).
role: A data.frame of identified roles of files, with columns path
and role
.
description: A list of information from orderly_description with human-readable descriptions and tags.
session: A list of information about the session as run,
with a list platform
containing information about the platform
(R version as version
, operating system as os
and system name
as system
) and packages
containing columns package
,
version
and attached
.
The nesting here makes providing a universally useful data format
difficult; if considering files we have a data.frame
with a
files
column, which is a list of data.frame
s; similar
nestedness applies to depends
and the orderly custom
data. However, you should be able to fairly easily process the
data into the format you need it in.
The simplest extraction uses names of top-level keys:
extract = c("name", "parameters", "files")
This creates a data.frame with columns corresponding to these
keys, one row per packet. Because name
is always a string, it
will be a character vector, but because parameters
and files
are more complex, these will be list columns.
You must not provide id
; it is always returned and always first
as a character vector column. If your extraction could possibly
return data from locations (i.e., you have allow_remote = TRUE
or have given a value for location
) then we add a logical column
local
which indicates if the packet is local to your archive,
meaning that you have all the files from it locally.
You can rename the columns by providing a name to entries within
extract
, for example:
extract = c("name", pars = "parameters", "files")
is the same as above, except that that the parameters
column has
been renamed pars
.
More interestingly, we can index into a structure like
parameters
; suppose we want the value of the parameter x
, we
could write:
extract = c(x = "parameters.x")
which is allowed because for each packet the parameters
element is a list.
However, we do not know what type x
is (and it might vary
between packets). We can add that information ourselves though and write:
extract = c(x = "parameters.x is number")
to create an numeric column. If any packet has a value of x
that
is non-integer, your call to orderly_metadata_extract
will fail
with an error, and if a packet lacks a value of x
, a missing
value of the appropriate type will be added.
Note that this does not do any coercion to number, it will error
if a non-NULL non-numeric value is found. Valid types for use
with is <type>
are boolean
, number
and string
(note that
these differ slightly from R's names because we want to emphasise
that these are scalar quantities; also note that there is no
integer
here as this may produce unexpected errors with
integer-like numeric values). You can also use list
but this is
the default. Things in the schema that are known to be scalar
atomics (such as name
) will be automatically simplified.
You can index into the array-valued elements (files
and
depends
) in the same way as for the object-valued elements:
extract = c(file_path = "files.path", file_hash = "files.hash")
would get you a list column of file names per packet and another
of hashes, but this is probably less useful than the data.frame
you'd get from extracting just files
because you no longer have
the hash information aligned.
You can index fairly deeply; it should be possible to get the orderly "display name" with:
extract = c(display = "custom.orderly.description.display is string")
If the path you need to extract has a dot in it (most likely a
package name for a plugin, such as custom.orderly.db
) you need
to escape the dot with a backslash (so, custom.orderly\.db
). You
will probably need two slashes or use a raw string (in recent
versions of R).
A data.frame
, the columns of which vary based on the
names of extract
; see Details for more information.
Within custom.orderly
, additional fields can be extracted. The
format of this is subject to change, both in the stored metadata
and schema (in the short term) and in the way we deserialise it.
It is probably best not to rely on this right now, and we will
expand this section when you can.
Low-level function for reading metadata and deserialising it. This function can be used to directly read a metadata json file without reference to a root which contains it. It may be useful in the context of reading a metadata file written out as part of a failed run.
orderly_metadata_read(path, plugins = TRUE)
orderly_metadata_read(path, plugins = TRUE)
path |
Path to the json file |
plugins |
Try and deserialise data from all loaded plugins (see Details). |
Custom metadata saved by plugins may not be deserialised as
expected when called with this function, as it is designed to
operate separately from a valid orderly root (i.e., it will load
data from any file regardless of where it came from). If plugins
is TRUE
(the default) then we will deserialise all data that
matches any loaded plugin. This means that the behaviour of this
function depends on if you have loaded the plugin packages. You
can force this by running orderly2::orderly_config()
within any
orderly directory, which will load any declared plugins.
A list of outpack metadata; see the schema for details. In
contrast to reading the json file directly with
jsonlite::fromJSON
, this function will take care to convert
scalar and length-one vectors into the expected types.
Create a new empty report.
orderly_new(name, template = NULL, force = FALSE, root = NULL)
orderly_new(name, template = NULL, force = FALSE, root = NULL)
name |
The name of the report |
template |
The template to use. The only acceptable values
for now are |
force |
Create an orderly file - |
root |
The path to the root directory, or |
Nothing, called for its side effects only
Declare orderly parameters. You should only have one call to this
within your file, though this is not enforced! Typically you'd put
it very close to the top, though the order does not really matter.
Parameters are scalar atomic values (e.g. a string, number or
boolean) and defaults must be present literally (i.e., they may
not come from a variable itself). Provide NULL
if you do not
have a default, in which case this parameter will be required.
orderly_parameters(...)
orderly_parameters(...)
... |
Any number of parameters |
Undefined
When running interactively (i.e., via source()
or running an
orderly file session by copy/paste or in Rstudio), the
orderly_parameters()
function has different behaviour.
First, we look in the current environment (most likely the global
environment) for values of your parameters - that is, variables
bound to the names of your parameters. For any parameters that
are not found we will look at the default values and use these
if possible, but if not possible then we will either error or
prompt based on the global option
orderly_interactive_parameters_missing_error
. If this is
TRUE
, then we will ask you to enter a value for the parameters
(strings will need to be entered with quotes).
For expert use only.
orderly_parse_file(path) orderly_parse_expr(exprs, filename)
orderly_parse_file(path) orderly_parse_expr(exprs, filename)
path |
Path to |
exprs |
Parsed AST from |
filename |
Name of |
Parses details of any calls to the orderly_ in-script functions
into intermediate representation for downstream use. Also validates
that any calls to orderly_*
in-script functions are well-formed.
Parsed orderly entrypoint script
Add plugin-specific metadata to a running packet. This will take some describing. You accumulate any number of bits of metadata into arbitrary fields, and then later on serialise these to json.
orderly_plugin_add_metadata(name, field, data)
orderly_plugin_add_metadata(name, field, data)
name |
The name of the plugin; must be the same as used in orderly_plugin_register and orderly_plugin_context |
field |
The name of a field to add the data to. This is required even if your plugin only produces one sort of data, in which case you can remove it later on within your serialisation function. |
data |
Arbitrary data to be added to the currently running packet |
Nothing, called only for its side effects
Fetch the running context, for use within a plugin. The intention
here is that within free functions that your plugin makes
available, you will call this function to get information about
the state of a packet. You will then typically call
orderly_plugin_add_metadata()
afterwards.
orderly_plugin_context(name, envir)
orderly_plugin_context(name, envir)
name |
Name of the plugin |
envir |
The environment of the calling function. You can
typically pass |
When a plugin function is called, orderly2 will be running in one
of two modes; (1) from within orderly_run()
, in
which case we're part way through creating a packet in a brand new
directory, and possibly using a special environment for
evaluation, or (2) interactively, with a user developing their
report. The plugin needs to be able to support both modes, and
this function will return information about the state to help you
cope with either case.
A list with elements:
is_active
: a logical, indicating if we're running under
orderly_run()
; you may need to change behaviour
depending on this value.
path
: the path of the running packet. This is almost always the
working directory, unless the packet contains calls to setwd()
or similar. You may create files here.
config
: the configuration for this plugin, after processing
with the plugin's read
function (see
orderly_plugin_register
)
envir
: the environment that the packet is running in. Often this
will be the global environment, but do not assume this! You may
read and write from this environment.
src
: the path to the packet source directory. This is
different to the current directory when the packet is running,
but the same when the user is interactively working with a
report. You may read from this directory but must not write
to it
parameters
: the parameters as passed through to the run the
report.
orderly_plugin_register, orderly_plugin_add_metadata
Create an orderly plugin. A plugin is typically defined by a
package and is used to extend orderly by enabling new
functionality, declared in orderly_config.yml
and your orderly file,
and affecting the running of reports primarily by creating new
objects in the report environment. This system is discussed in
more detail in vignette("plugins")
, but will be expanded (likely
in breaking ways) soon.
orderly_plugin_register( name, config, serialise = NULL, deserialise = NULL, cleanup = NULL, schema = NULL )
orderly_plugin_register( name, config, serialise = NULL, deserialise = NULL, cleanup = NULL, schema = NULL )
name |
The name of the plugin, typically the package name |
config |
A function to read, check and process the
configuration section in |
serialise |
A function to serialise any metadata added by the
plugin's functions to the outpack metadata. It will be passed a
list of all entries pushed in via
|
deserialise |
A function to deserialise any metadata
serialised by the |
cleanup |
Optionally, a function to clean up any state that
your plugin uses. You can call |
schema |
Optionally a path, within the package, to a schema
for the metadata created by this plugin; you should omit the
|
Nothing, this function is called for its side effect of registering a plugin.
Prune orphan packets from your metadata store. This function can be used to remove references to packets that are no longer reachable; this could have happened because you deleted a packet manually from the archive and ran orderly_validate_archive or because you removed a location.
orderly_prune_orphans(root = NULL)
orderly_prune_orphans(root = NULL)
root |
The path to the root directory, or |
If an orphan packet is not used anywhere, then we can easily drop it - it's as if it never existed. If it is referenced by metadata that you know about from elsewhere but not locally, then that is a problem for the upstream location (and one that should not happen). If you have referenced it in a packet that you have run locally, the the metadata is not deleted.
We expose this function mostly for users who want to expunge permanently any reference to previously run packets. We hope that there should never need to really be a reason to run it.
Invisibly, a character vector of orphaned packet ids
Construct an outpack query, typically then passed through to orderly_search
orderly_query(expr, name = NULL, scope = NULL, subquery = NULL)
orderly_query(expr, name = NULL, scope = NULL, subquery = NULL)
expr |
The query expression. A |
name |
Optionally, the name of the packet to scope the query on. This
will be intersected with |
scope |
Optionally, a scope query to limit the packets
searched by |
subquery |
Optionally, named list of subqueries which can be
referenced by name from the |
An orderly_query
object, which should not be modified,
but which can be passed to orderly_search()
Explain how a query has or has not matched. This is experimental
and the output will change. At the moment, it can tell you why a
query matches, or if fails to match based on one of a number of
&&
-ed together clauses.
orderly_query_explain( expr, name = NULL, scope = NULL, subquery = NULL, parameters = NULL, envir = parent.frame(), location = NULL, allow_remote = NULL, root = NULL )
orderly_query_explain( expr, name = NULL, scope = NULL, subquery = NULL, parameters = NULL, envir = parent.frame(), location = NULL, allow_remote = NULL, root = NULL )
expr |
The query expression. A |
name |
Optionally, the name of the packet to scope the query on. This
will be intersected with |
scope |
Optionally, a scope query to limit the packets
searched by |
subquery |
Optionally, named list of subqueries which can be
referenced by name from the |
parameters |
Optionally, a named list of parameters to substitute
into the query (using the |
envir |
Optionally, an environment to substitute into the
query (using the |
location |
Optional vector of locations to pull from. We might in future expand this to allow wildcards or exceptions. |
allow_remote |
Logical, indicating if we should allow packets
to be found that are not currently unpacked (i.e., are known
only to a location that we have metadata from). If this is
|
root |
The path to the root directory, or |
An object of class orderly_query_explain
, which can be
inspected (contents subject to change) and which has a print
method which will show a user-friendly summary of the query
result.
Declare that a file, or group of files, are an orderly
resource. By explicitly declaring files as resources orderly will
mark the files as immutable inputs and validate that your analysis
does not modify them when run with orderly_run()
orderly_resource(files)
orderly_resource(files)
files |
Any number of names of files |
Invisibly, a character vector of resources included by the call. Don't rely on the order of these files if they are expanded from directories, as this is likely platform dependent.
Run a report. This will create a new directory in
drafts/<reportname>
, copy your declared resources there, run
your script and check that all expected artefacts were created.
orderly_run( name, parameters = NULL, envir = NULL, echo = TRUE, location = NULL, allow_remote = NULL, fetch_metadata = FALSE, search_options = NULL, root = NULL )
orderly_run( name, parameters = NULL, envir = NULL, echo = TRUE, location = NULL, allow_remote = NULL, fetch_metadata = FALSE, search_options = NULL, root = NULL )
name |
Name of the report to run. Any leading |
parameters |
Parameters passed to the report. A named list of
parameters declared in the |
envir |
The environment that will be used to evaluate the report script; by default we use the global environment, which may not always be what is wanted. |
echo |
Optional logical to control printing output from
|
location |
Optional vector of locations to pull from. We might in future expand this to allow wildcards or exceptions. |
allow_remote |
Logical, indicating if we should allow packets
to be found that are not currently unpacked (i.e., are known
only to a location that we have metadata from). If this is
|
fetch_metadata |
Logical, indicating if we should pull
metadata immediately before the search. If |
search_options |
DEPRECATED. Please don't use this any
more, and instead use the arguments |
root |
The path to the root directory, or |
The id of the created report (a string)
If your packet depends on other packets, you will want to control the locations that are used to find appropriate packets. The control for this is passed through this function and not as an argument to orderly_dependency because this is a property of the way that a packet is created and not of a packet itself; importantly different users may have different names for their locations so it makes little sense to encode the location name into the source code. Alternatively, you want to use different locations in different contexts (initial development where you want to include local copies packets as possible dependencies vs resolving dependencies only as they would be resolved on one of your locations!
Similarly, you might want to include packets that are known by other locations but are not currently downloaded onto this machine - pulling these packets in could take anything from seconds to hours depending on their size and the speed of your network connection (but not pulling in the packets could mean that your packet fails to run).
To allow for control over this you can pass in an arguments to control the names of the locations to use, whether metadata should be refreshed before we pull anything and if packets that are not currently downloaded should be considered candidates.
This has no effect when running interactively, in which case you can specify the search options (root specific) with orderly_interactive_set_search_options
The arguments location
, allow_remote
and fetch_metadata
control where outpack searches for packets with the given query
and if anything might be moved over the network (or from one
outpack archive to another). By default everything is resolved
locally only; that is we can only depend on packets that are
unpacked within our current archive. If you pass allow_remote = TRUE
, then packets that are known anywhere are candidates for
using as dependencies and if needed we will pull the resolved
files from a remote location. Note that even if the packet is
not locally present this might not be needed - if you have the
same content anywhere else in an unpacked packet we will reuse
the same content without re-fetching.
If fetch_metadata = TRUE
, then we will refresh location metadata
before pulling, and the location
argument controls which
locations are pulled from.
use_draft
optionThe above location handling generalises orderly (v1)'s old
use_draft
option, in terms of the new location
argument:
use_draft = TRUE
is location = "local"
use_draft = FALSE
is location = c(...)
where you should provide
all locations except local
(setdiff(orderly2::orderly_location_list(), "local")
)
use_draft = "newer"
is location = NULL
(this last option was the one most people preferred so is the new default behaviour). In addition, you could resolve dependencies as they currently exist on production right now with the options:
location = "production", fetch_metadata = TRUE
which updates your current metadata from production, then runs queries against only packets known on that remote, then depends on them even if you don't (yet) have them locally. This functionality was never available in orderly version 1, though we had intended to support it.
Sometimes it is useful to run things from a different place on disk to your outpack root. We know of two cases where this has come up:
when running reports within a runner on a server, we make a clean clone of the source tree at a particular git reference into a new temporary directory and then run the report there, but have it insert into an orderly repo at a fixed and non-temporary location.
we have a user for whom it is more convenient torun their report on a hard drive but store the archive and metadata on a (larger) shared drive.
In the first instance, we have a source path at <src>
which
contains the file orderly_config.yml
and the directory src/
with our source reports, and a separate path <root>
which
contains the directory .outpack/
with all the metadata - it
may also have an unpacked archive, and a .git/
directory
depending on the configuration. (Later this will make more sense
once we support a "bare" outpack layout.)
To manually set the report source directory, you will need to set
the path of the directory as the ORDERLY_REPORT_SRC
environment
variable.
# Create a simple example: path <- orderly2::orderly_example("default") # Run the 'data' task: orderly2::orderly_run("data", root = path) # After running, a finished packet appears in the archive: fs::dir_tree(path) # and we can query the metadata: orderly2::orderly_metadata_extract(name = "data", root = path) # Cleanup fs::dir_delete(path)
# Create a simple example: path <- orderly2::orderly_example("default") # Run the 'data' task: orderly2::orderly_run("data", root = path) # After running, a finished packet appears in the archive: fs::dir_tree(path) # and we can query the metadata: orderly2::orderly_metadata_extract(name = "data", root = path) # Cleanup fs::dir_delete(path)
Fetch information about the actively running report. This allows you to reflect information about your report back as part of the report, for example embedding the current report id, or information about computed dependencies. This information is in a slightly different format to orderly version 1.x and does not (currently) include information about dependencies when run outside of orderly_run, but this was never reliable previously.
orderly_run_info()
orderly_run_info()
A list with elements
name
: The name of the current report
id
: The id of the current report, NA
if running interactively
root
: The orderly root path
depends
: A data frame with information about the dependencies
(not available interactively)
index
: an integer sequence along calls to
orderly_dependency
name
: the name of the dependency
query
: the query used to find the dependency
id
: the computed id of the included packet
filename
: the file used from the packet
as
: the filename used locally
Evaluate a query against the outpack database, returning a vector
of matching packet ids. Note that by default this only searches
through packets that are unpacked and available for direct use on
this computer; to search within packets known to other locations
(and that we might know about via their metadata) you will need to
use the options
argument.
orderly_search( expr, name = NULL, scope = NULL, subquery = NULL, parameters = NULL, envir = parent.frame(), location = NULL, allow_remote = NULL, fetch_metadata = FALSE, options = NULL, root = NULL )
orderly_search( expr, name = NULL, scope = NULL, subquery = NULL, parameters = NULL, envir = parent.frame(), location = NULL, allow_remote = NULL, fetch_metadata = FALSE, options = NULL, root = NULL )
expr |
The query expression. A |
name |
Optionally, the name of the packet to scope the query on. This
will be intersected with |
scope |
Optionally, a scope query to limit the packets
searched by |
subquery |
Optionally, named list of subqueries which can be
referenced by name from the |
parameters |
Optionally, a named list of parameters to substitute
into the query (using the |
envir |
Optionally, an environment to substitute into the
query (using the |
location |
Optional vector of locations to pull from. We might in future expand this to allow wildcards or exceptions. |
allow_remote |
Logical, indicating if we should allow packets
to be found that are not currently unpacked (i.e., are known
only to a location that we have metadata from). If this is
|
fetch_metadata |
Logical, indicating if we should pull
metadata immediately before the search. If |
options |
DEPRECATED. Please don't use this any more, and
instead use the arguments |
root |
The path to the root directory, or |
A character vector of matching ids. In the case of no
match from a query returning a single value (e.g., latest(...)
or single(...)
) this will be a character missing value
(NA_character_
)
Options for controlling how packet searches are carried out, for example via orderly_search and orderly_run. The details here are never included in the metadata alongside the query (that is, they're not part of the query even though they affect it).
orderly_search_options( location = NULL, allow_remote = NULL, pull_metadata = FALSE )
orderly_search_options( location = NULL, allow_remote = NULL, pull_metadata = FALSE )
location |
Optional vector of locations to pull from. We might in future expand this to allow wildcards or exceptions. |
allow_remote |
Logical, indicating if we should allow packets
to be found that are not currently unpacked (i.e., are known
only to a location that we have metadata from). If this is
|
pull_metadata |
Logical, indicating if we should pull
metadata immediately before the search. If |
An object of class orderly_search_options
which should
not be modified after creation (but see note about fetch_metadata
)
Put orderly2 into "strict mode", which is closer to the defaults in orderly 1.0.0; in this mode only explicitly included files (via orderly_resource and orderly_shared_resource) are copied when running a packet, and we warn about any unexpected files at the end of the run. Using strict mode allows orderly2 to be more aggressive in how it deletes files within the source directory, more accurate in what it reports to you, and faster to start packets after developing them interactively.
orderly_strict_mode()
orderly_strict_mode()
In future, we may extend strict mode to allow requiring that no
computation occurs within orderly functions (i.e., that the
requirements to run a packet are fully known before actually
running it). Most likely this will not be the default behaviour
and orderly_strict_mode
will gain an argument.
We will allow server processes to either override this value (enabling it even when it is not explicitly given) and/or require it.
Undefined
Validate unpacked packets. Over time, expect this function to become more fully featured, validating more.
orderly_validate_archive( expr = NULL, name = NULL, action = "inform", root = NULL )
orderly_validate_archive( expr = NULL, name = NULL, action = "inform", root = NULL )
expr |
The query expression. A |
name |
Optionally, the name of the packet to scope the query on. This
will be intersected with |
action |
The action to take on finding an invalid packet. See Details. |
root |
The path to the root directory, or |
The actions that we can take on finding an invalid packet are:
inform
(the default): just print information about the problem
orphan
: mark the packet as orphaned within the metadata, but
do not touch the files in your archive (by default the directory
archive/
) - this is a safe option and will leave you in a
consistent state without deleting anything.
delete
: in addition to marking the packet as an orphan, also
delete the files from your archive.
Later, we will add a "repair" option to try and fix broken packets.
The validation interacts with the option
core.require_complete_tree
; if this option is TRUE
, then a
packet is only valid if all its (recursive) dependencies are also
valid, so the action will apply to packets that have also had
their upstream dependencies invalidated. This validation will
happen even if the query implied by ...
does not include these
packets if a complete tree is required.
The validation will also interact with core.use_file_store
once
repair is supported, as this becomes trivial.
Invisibly, a character vector of repaired (or invalid) packets.