Get yourself running R jobs on the cluster in 10 minutes or so.
Assumptions that we make here:
you are using R
your task can be represented as running a function on some inputs to create an output (a file based output is OK)
you are working on a network share and have this mounted on your computer
you know what packages your code depends on
your package dependencies are all on CRAN, and are all available in windows binary form.
If any of these do not apply to you, you’ll probably need to read the full vignette. In any case the full vignette contains a bunch more information anyway.
Install the packages using drat
On windows if you are using a domain machine, you should need only to select the cluster you want to use
Otherwise, and on any other platform you’ll need to provide your username:
You can see the default configuration with
didehpc::didehpc_config()
#> <didehpc_config>
#> - cluster: fi--dideclusthn
#> - credentials:
#> - username: rfitzjoh
#> - password: *******************
#> - username: rfitzjoh
#> - resource:
#> - template: GeneralNodes
#> - parallel: FALSE
#> - count: 1
#> - type: Cores
#> - shares:
#> - home: (local) /home/rich/net/home => \\fi--san03.dide.ic.ac.uk\homes\rfitzjoh => Q: (remote)
#> - temp: (local) /home/rich/net/temp => \\fi--didef3.dide.ic.ac.uk\tmp => T: (remote)
#> - use_workers: FALSE
#> - use_rrq: FALSE
#> - worker_timeout: 600
#> - conan_bootstrap: TRUE
#> - r_version: 4.0.3
#> - use_java: FALSE
#> - redis_host: fi--dideclusthn.dide.ic.ac.uk
If this is the first time you have run this package, best to try out the login procedure with:
because this exposes a number of problems early on.
Make a vector of packages that you use in your project:
And of files that define functions that you ned to run things:
If you had a vector here that would be OK too.
Then save this together to form a “context”.
ctx <- context::context_save("contexts", packages = packages, sources = sources)
#> [ open:db ] rds
#> [ save:id ] 9a70cec48c3108b80503144f9b88cc8d
#> [ save:name ] complexional_australiancurlew
If you have no packages or no sources, use NULL
or omit
them in the call below (which is the default anyway).
The first argument here, "contexts"
is the name of a
directory that we will use to hold a lot of information about your jobs.
You don’t need (or particularly want) to know what is in here.
This will prompt you for your password, as it will try and log in.
It also installs windows versions of all packages within the
contexts
directory – both packages required to get this
whole system working and then the packages required for your particular
jobs.
obj <- didehpc::queue_didehpc(ctx)
#> Loading context 9a70cec48c3108b80503144f9b88cc8d
#> [ context ] 9a70cec48c3108b80503144f9b88cc8d
#> [ library ] dplyr, tidyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> [ namespace ]
#> [ source ] mysources.R
#> Running installation script on cluster
#> ,:\ /:.
#> // \_()_/ \\
#> || | | || CONAN THE LIBRARIAN
#> || | | || Library: Q:\didehpc\20210817-145020\contexts\lib\windows\4.0
#> || |____| || Bootstrap: T:\conan\bootstrap\4.0
#> \\ / || \ // Cache: Q:\didehpc\20210817-145020\contexts\conan\cache/pkg
#> `:/ || \;' Policy: lazy
#> || Repos:
#> || * https://mrc-ide.github.io/didehpc-pkgs
#> XX * https://cloud.r-project.org
#> XX Packages:
#> XX * dplyr
#> XX * tidyr
#> OO
#> `'
#> i Loading metadata database
#> v Loading metadata database ... done
#> i Getting 17 pkgs (9.49 MB)
#> v Got ellipsis 0.3.2 (windows) (49.19 kB)
#> v Got generics 0.1.0 (windows) (70.74 kB)
#> v Got glue 1.4.2 (windows) (155.50 kB)
#> v Got lifecycle 1.0.0 (windows) (111.22 kB)
#> v Got fansi 0.5.0 (windows) (248.45 kB)
#> v Got cli 3.0.1 (windows) (758.73 kB)
#> v Got pkgconfig 2.0.3 (windows) (22.31 kB)
#> v Got magrittr 2.0.1 (windows) (234.90 kB)
#> v Got dplyr 1.0.7 (windows) (1.35 MB)
#> v Got purrr 0.3.4 (windows) (430.04 kB)
#> v Got tidyselect 1.1.1 (windows) (204.19 kB)
#> v Got tibble 3.1.3 (windows) (835.59 kB)
#> v Got utf8 1.2.2 (windows) (209.88 kB)
#> v Got rlang 0.4.11 (windows) (1.21 MB)
#> v Got vctrs 0.3.8 (windows) (1.25 MB)
#> v Got tidyr 1.1.3 (windows) (1.06 MB)
#> v Got pillar 1.6.2 (windows) (1.07 MB)
#> v Installed generics 0.1.0 (532ms)
#> v Installed cli 3.0.1 (1.3s)
#> v Installed ellipsis 0.3.2 (1.1s)
#> v Installed fansi 0.5.0 (1.3s)
#> v Installed glue 1.4.2 (1.4s)
#> v Installed lifecycle 1.0.0 (1.6s)
#> v Installed magrittr 2.0.1 (1.7s)
#> v Installed dplyr 1.0.7 (2.3s)
#> v Installed pkgconfig 2.0.3 (1.3s)
#> v Installed pillar 1.6.2 (1.8s)
#> v Installed purrr 0.3.4 (1.5s)
#> v Installed rlang 0.4.11 (1.4s)
#> v Installed tidyselect 1.1.1 (1.2s)
#> v Installed tibble 3.1.3 (1.6s)
#> v Installed utf8 1.2.2 (1.3s)
#> v Installed vctrs 0.3.8 (1.2s)
#> v Installed tidyr 1.1.3 (1.2s)
#> v Summary: 17 new 2 kept in 23.7s
#> Done!
Once you get to this point we’re ready to start running things on the cluster. Let’s fire off a test to make sure that everything works OK:
We can poll the job for a while, which will print a progress bar. If the job is returned in time, it will return the result of running the function. Otherwise it will throw an error.
t$wait(120)
#> (-) waiting for 2fa8770...608, giving up in 119.5 s (\) waiting for
#> 2fa8770...608, giving up in 119.0 s (|) waiting for 2fa8770...608, giving up in
#> 118.5 s (/) waiting for 2fa8770...608, giving up in 117.9 s
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows Server 2012 R2 x64 (build 9600)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252
#> [2] LC_CTYPE=English_United Kingdom.1252
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United Kingdom.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.0.7 tidyr_1.1.3
#>
#> loaded via a namespace (and not attached):
#> [1] fansi_0.5.0 digest_0.6.27 utf8_1.2.2 crayon_1.4.1
#> [5] context_0.3.0 R6_2.5.0 lifecycle_1.0.0 storr_1.2.5
#> [9] magrittr_2.0.1 pillar_1.6.2 rlang_0.4.11 vctrs_0.3.8
#> [13] generics_0.1.0 ellipsis_0.3.2 glue_1.4.2 purrr_0.3.4
#> [17] compiler_4.0.3 pkgconfig_2.0.3 tidyselect_1.1.1 tibble_3.1.3
You can use t$result()
to get the result straight away
(throwing an error if it is not ready) or t$wait(Inf)
to
wait forever.
t$result()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows Server 2012 R2 x64 (build 9600)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252
#> [2] LC_CTYPE=English_United Kingdom.1252
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United Kingdom.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.0.7 tidyr_1.1.3
#>
#> loaded via a namespace (and not attached):
#> [1] fansi_0.5.0 digest_0.6.27 utf8_1.2.2 crayon_1.4.1
#> [5] context_0.3.0 R6_2.5.0 lifecycle_1.0.0 storr_1.2.5
#> [9] magrittr_2.0.1 pillar_1.6.2 rlang_0.4.11 vctrs_0.3.8
#> [13] generics_0.1.0 ellipsis_0.3.2 glue_1.4.2 purrr_0.3.4
#> [17] compiler_4.0.3 pkgconfig_2.0.3 tidyselect_1.1.1 tibble_3.1.3
This is just using the enqueue
function as above. But it
also works with functions defined in files passed in as
sources
; here the function random_walk
.
t <- obj$enqueue(random_walk(0, 10))
res <- t$wait(120)
#> (-) waiting for 66cc979...4f0, giving up in 119.5 s (\) waiting for
#> 66cc979...4f0, giving up in 119.0 s (|) waiting for 66cc979...4f0, giving up in
#> 118.5 s (/) waiting for 66cc979...4f0, giving up in 118.0 s
res
#> [1] -1.973025 -2.823971 -2.880453 -2.392717 -1.782159 -2.923010 -2.981436
#> [8] -3.403000 -2.978410 -3.989555
The t
object has a number of other methods you can
use:
t
#> <queuer_task>
#> Public:
#> clone: function (deep = FALSE)
#> context_id: function ()
#> expr: function (locals = FALSE)
#> id: 66cc9796e23fa4489a41bb5cfdbef4f0
#> initialize: function (id, root, check_exists = TRUE)
#> log: function (parse = TRUE)
#> result: function (allow_incomplete = FALSE)
#> root: context_root
#> status: function ()
#> times: function (unit_elapsed = "secs")
#> wait: function (timeout, time_poll = 0.5, progress = NULL)
Get the result from running a task
t$result()
#> [1] -1.973025 -2.823971 -2.880453 -2.392717 -1.782159 -2.923010 -2.981436
#> [8] -3.403000 -2.978410 -3.989555
Get the status of the task
(might also be “PENDING”, “RUNNING” or “ERROR”
Get the original expression:
Find out how long everything took
t$times()
#> task_id submitted started
#> 1 66cc9796e23fa4489a41bb5cfdbef4f0 2021-08-17 14:53:04 2021-08-17 14:53:06
#> finished waiting running idle
#> 1 2021-08-17 14:53:06 2.134212 0.03126001 0.4418352
You may see negative numbers for “waiting” as the submitted time is based on your computer and started/finished are based on the cluster.
And get the log from running the task
t$log()
#> [ hello ] 2021-08-17 14:53:04
#> [ wd ] Q:/didehpc/20210817-145020
#> [ init ] 2021-08-17 14:53:05.042
#> [ hostname ] FI--DIDECLUST26
#> [ process ] 3800
#> [ version ] 0.3.0
#> [ open:db ] rds
#> [ context ] 9a70cec48c3108b80503144f9b88cc8d
#> [ library ] dplyr, tidyr
#>
#> Attaching package: 'dplyr'
#>
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#>
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> [ namespace ]
#> [ source ] mysources.R
#> [ parallel ] running as single core job
#> [ root ] Q:\didehpc\20210817-145020\contexts
#> [ context ] 9a70cec48c3108b80503144f9b88cc8d
#> [ task ] 66cc9796e23fa4489a41bb5cfdbef4f0
#> [ expr ] random_walk(0, 10)
#> [ start ] 2021-08-17 14:53:06.199
#> [ ok ]
#> [ end ] 2021-08-17 14:53:06.261
#> Warning messages:
#> 1: package 'tidyr' was built under R version 4.0.5
#> 2: package 'dplyr' was built under R version 4.0.5
There is also a bit of DIDE specific logging that happens before this point; if the job fails inexplicably the answer may be in:
obj$dide_log(t)
#> [1] "generated on host: kea"
#> [2] "generated on date: 2021-08-17"
#> [3] "didehpc version: 0.3.6"
#> [4] "context version: 0.3.0"
#> [5] "running on: FI--DIDECLUST26"
#> [6] "mapping Q: -> \\\\fi--san03.dide.ic.ac.uk\\homes\\rfitzjoh"
#> [7] "The command completed successfully."
#> [8] ""
#> [9] "mapping T: -> \\\\fi--didef3.dide.ic.ac.uk\\tmp"
#> [10] "The command completed successfully."
#> [11] ""
#> [12] "Using Rtools at T:\\Rtools\\Rtools40"
#> [13] "working directory: Q:\\didehpc\\20210817-145020"
#> [14] "this is a single task"
#> [15] "logfile: Q:\\didehpc\\20210817-145020\\contexts\\logs\\66cc9796e23fa4489a41bb5cfdbef4f0"
#> [16] ""
#> [17] "Q:\\didehpc\\20210817-145020>Rscript \"Q:\\didehpc\\20210817-145020\\contexts\\bin\\task_run\" \"Q:\\didehpc\\20210817-145020\\contexts\" 66cc9796e23fa4489a41bb5cfdbef4f0 1>\"Q:\\didehpc\\20210817-145020\\contexts\\logs\\66cc9796e23fa4489a41bb5cfdbef4f0\" 2>&1"
#> [18] "Removing mapping Q:"
#> [19] "Q: was deleted successfully."
#> [20] ""
#> [21] "Removing mapping T:"
#> [22] "T: was deleted successfully."
#> [23] ""
#> [24] "Quitting"
Want more information? See vignette("didehpc")
for more
details.