If you have an orderly report that takes a very long time, or needs to run in parallel, you might need to send it to run on another computer. There are a number of ways of achieving this - the simplest might be to clone the source tree to another computer, run the reports there and use one of a number of possible approaches to sync the outputs between computers. However, there will be cases where that is not ideal, and you want to move around much less data around.
This vignette describes a way of parcelling together all dependencies of an orderly report into a zip file (a “bundle”) that can be distributed to another machine (e.g., via scp, rsync or a shared file system), run there, and returned. It does not provide a transparent approach to using high-performance computing with orderly as we feel that the specific circumstances are too varied to support this directly.
In order to use orderly bundles we make some assumptions and conventions.
First, we assume that you will be running your exported report on another machine (otherwise you would have access to the orderly tree) and that your report takes really quite a long time.
Second, we assume you have your own way of getting the bundled reports to your other machine and the completed bundles back again — we expect that the details here will be specific to your needs and situation and that the overhead of doing this will be trivial compared with the cost of running the report.
Third, we assume that you will deal with all issues around queuing, locking and fault tolerance. From orderly’s point of view work will be exported from orderly and at this point you’re in control - we expect it to come back computed at some point, though we do not enforce that.
Fourth, that the machine running the report can be trusted to actually run the report - if you have set up an orderly server that is safe from interactively run reports, don’t allow importing from anyone’s laptops if you want to preserve this.
Fifth, that you trust your remote machine with your data, and that the remote machine trusts your orderly archive enough to run arbitrary code on it.
We will use the orderly demo example, and pack up the
use_dependency
report.
The use_dependency
report has a dependency, which we
run
## [ name ] other
## [ id ] 20241217-042658-e3aeb76c
## [ sources ] functions.R
## [ parameter ] nmin: 0
## [ start ] 2024-12-17 04:26:58.91156
## [ data ] source => extract: 20 x 2
## [ parameter ] nmin: 0
## [ end ] 2024-12-17 04:26:58.986846
## [ elapsed ] Ran report in 0.07528615 secs
## [ artefact ] summary.csv: 3fac8347e152c84c96e6676413c718b7
## [ ... ] graph.png: 67b3e662440b3978ee78b9d4cc159884
## [ commit ] other/20241217-042658-e3aeb76c
## [ copy ]
## [ import ] other:20241217-042658-e3aeb76c
## [ success ] :)
## [1] "/tmp/RtmpcKvB0Z/filef5b4541ca6e/archive/other/20241217-042658-e3aeb76c"
We need a place that we’ll put the bundles:
Now, we can pack up use_dependency
to run
## [ name ] use_dependency
## [ id ] 20241217-042659-4caf6ae0
## [ depends ] other@20241217-042658-e3aeb76c:summary.csv -> incoming.csv
## [ start ] 2024-12-17 04:26:59.302556
## [ bundle pack ] 20241217-042659-4caf6ae0
## $id
## [1] "20241217-042659-4caf6ae0"
##
## $path
## [1] "/tmp/RtmpcKvB0Z/filef5b455d0496/20241217-042659-4caf6ae0.zip"
orderly_bundle_pack
has created a zip file. The format
of this file is internal to orderly (it will likely change and will at
some point become resistant to tampering), but contains:
## filename compressed_size uncompressed_size
## 1 20241217-042659-4caf6ae0/ 0 0
## 2 20241217-042659-4caf6ae0/meta/ 0 0
## 3 20241217-042659-4caf6ae0/meta/config.rds 1460 1455
## 4 20241217-042659-4caf6ae0/meta/info.rds 3421 3416
## 5 20241217-042659-4caf6ae0/meta/manifest.rds 285 280
## 6 20241217-042659-4caf6ae0/meta/session.rds 24819 24814
## 7 20241217-042659-4caf6ae0/pack/ 0 0
## 8 20241217-042659-4caf6ae0/pack/incoming.csv 542 888
## 9 20241217-042659-4caf6ae0/pack/orderly.yml 214 358
## 10 20241217-042659-4caf6ae0/pack/script.R 175 220
## timestamp permissions crc32 offset
## 1 2024-12-17 04:26:58 755 00000000 0
## 2 2024-12-17 04:26:58 755 00000000 55
## 3 2024-12-17 04:26:58 644 81854df6 115
## 4 2024-12-17 04:26:58 644 235a7493 1661
## 5 2024-12-17 04:26:58 644 072960fa 5166
## 6 2024-12-17 04:26:58 644 2b5dfb70 5539
## 7 2024-12-17 04:26:58 755 00000000 30445
## 8 2024-12-17 04:26:58 644 e3c70810 30505
## 9 2024-12-17 04:26:58 644 93fda3f3 31135
## 10 2024-12-17 04:26:58 644 58329b6b 31436
The subdirectory pack
contains the report working
directory, all code and dependencies, etc, while meta
contains additional information required to run the report. In
particular the incoming.csv
file in the pack
directory contains the dependency imported from other
.
Then copy this zip file somewhere else to run it (details vary based on your system, and moving the file is not necessary to run it, though it will be the most likely situation).
Once the files have been moved we can run it with:
## [ start ] 2024-12-17 04:26:59.401892
##
## > d <- read.csv("incoming.csv", stringsAsFactors = FALSE)
##
## > png("graph.png")
##
## > par(mar = c(15, 4, 0.5, 0.5))
##
## > barplot(setNames(d$number, d$name), las = 2)
##
## > dev.off()
## png
## 2
##
## > info <- orderly::orderly_run_info()
##
## > saveRDS(info, "info.rds")
## [ end ] 2024-12-17 04:26:59.418096
## [ elapsed ] Ran report in 0.0162046 secs
## [ artefact ] graph.png: 67b3e662440b3978ee78b9d4cc159884
## [ ... ] info.rds: 6cd4a4f8030a4336bf52ca4178af42da
With the workdir
being a directory that you want the
report to be run in. This can be the same as the path the incoming zip
file is found, if you want, but this will make it harder to know what
has been run already or not.
This creates another zip file, but this time contains the results of running the report.
The result can be imported into order by using
orderly::orderly_bundle_import
with the path to the zip
file:
## [ import ] use_dependency:20241217-042659-4caf6ae0
The copy of use_dependency
is now in the archive and can
be used like any other orderly report
## name id
## 1 other 20241217-042658-e3aeb76c
## 2 use_dependency 20241217-042659-4caf6ae0
## other [20241217-042658-e3aeb76c]
## └──use_dependency [20241217-042659-4caf6ae0]
We do not yet verify that the incoming bundle comes from a trusted source, nor that the completed bundle was run on a trusted system, nor that the bundle pack was not modified en route. We will add a signing and verifying step in future that will address these issues.
We do not support encryption of the bundle, but may do so in a future version.
Secrets will be copied in clear text if included.
We do not check that the package versions on the remote machine are suitable, though metadata is included to support this in future.
We do not support reading from databases using the
connection:
field while the report is running (reading data
before is fine). In future we may relax this so that the remote
report interacts with the database. Practically this means that bundled
reports cannot use the connection:
field, and all data will
be packaged into the bundle, and so may be large.
As discussed above in the overview section, orderly does not deal with the movement of bundles between machines, nor queuing of these bundles.