--- title: "Package installation" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Package installation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Often the most difficult part of configuring your cluster jobs is sorting out all the packages that you need and making sure that they are present on the cluster. There are several levels of difficulty here and this document will walk through them in turn. ## Everything is on CRAN This is the most straightforward situation - all your packages are on CRAN. You don't need to do anything special typically, just create your context with a list of packages and create the queue: ```r root <- "pkgs" ctx <- context::context_save(root, packages = c("dplyr", "ggplot2")) #> [ init:id ] a11d9597170a8cc72cc5b57c3ac3d7a0 #> [ init:db ] rds #> [ init:path ] pkgs #> [ save:id ] 2ab2a78d26df3a50b9beb45b52eba466 #> [ save:name ] seaisland_mammal obj <- didehpc::queue_didehpc(ctx) #> Loading context 2ab2a78d26df3a50b9beb45b52eba466 #> [ context ] 2ab2a78d26df3a50b9beb45b52eba466 #> [ library ] dplyr, ggplot2 #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union #> [ namespace ] #> [ source ] #> Running installation script on cluster #> ,:\ /:. #> // \_()_/ \\ #> || | | || CONAN THE LIBRARIAN #> || | | || Library: Q:\didehpc\20210817-145020\pkgs\lib\windows\4.0 #> || |____| || Bootstrap: T:\conan\bootstrap\4.0 #> \\ / || \ // Cache: Q:\didehpc\20210817-145020\pkgs\conan\cache/pkg #> `:/ || \;' Policy: lazy #> || Repos: #> || * https://mrc-ide.github.io/didehpc-pkgs #> XX * https://cloud.r-project.org #> XX Packages: #> XX * context #> XX * dplyr #> OO * ggplot2 #> `' #> i Loading metadata database #> v Loading metadata database ... done #> i Getting 36 pkgs (27.89 MB) and 1 pkg with unknown size #> v Got ids 1.0.1 (windows) (123.89 kB) #> v Got askpass 1.1 (windows) (243.58 kB) #> v Got digest 0.6.27 (windows) (268.65 kB) #> v Got R6 2.5.0 (windows) (84.09 kB) #> v Got context 0.3.0 (source) (37.72 kB) #> v Got sys 3.4 (windows) (59.83 kB) #> v Got uuid 0.1-4 (windows) (33.77 kB) #> v Got storr 1.2.5 (windows) (401.33 kB) #> v Got crayon 1.4.1 (windows) (141.87 kB) #> v Got ellipsis 0.3.2 (windows) (49.19 kB) #> v Got generics 0.1.0 (windows) (70.74 kB) #> v Got openssl 1.4.4 (windows) (4.10 MB) #> v Got cli 3.0.1 (windows) (758.73 kB) #> v Got glue 1.4.2 (windows) (155.50 kB) #> v Got magrittr 2.0.1 (windows) (234.90 kB) #> v Got lifecycle 1.0.0 (windows) (111.22 kB) #> v Got pkgconfig 2.0.3 (windows) (22.31 kB) #> v Got rlang 0.4.11 (windows) (1.21 MB) #> v Got purrr 0.3.4 (windows) (430.04 kB) #> v Got tidyselect 1.1.1 (windows) (204.19 kB) #> v Got RColorBrewer 1.1-2 (windows) (55.55 kB) #> v Got pillar 1.6.2 (windows) (1.07 MB) #> v Got tibble 3.1.3 (windows) (835.59 kB) #> v Got utf8 1.2.2 (windows) (209.88 kB) #> v Got dplyr 1.0.7 (windows) (1.35 MB) #> v Got vctrs 0.3.8 (windows) (1.25 MB) #> v Got gtable 0.3.0 (windows) (434.23 kB) #> v Got labeling 0.4.2 (windows) (62.73 kB) #> v Got munsell 0.5.0 (windows) (245.14 kB) #> v Got scales 1.1.1 (windows) (558.34 kB) #> v Got farver 2.1.0 (windows) (1.75 MB) #> v Got isoband 0.2.5 (windows) (2.73 MB) #> v Got fansi 0.5.0 (windows) (248.45 kB) #> v Got colorspace 2.0-2 (windows) (2.65 MB) #> v Got withr 2.4.2 (windows) (212.63 kB) #> v Got viridisLite 0.4.0 (windows) (1.30 MB) #> v Got ggplot2 3.3.5 (windows) (4.13 MB) #> v Installed R6 2.5.0 (735ms) #> v Installed crayon 1.4.1 (797ms) #> v Installed ids 1.0.1 (922ms) #> v Installed askpass 1.1 (1.3s) #> v Installed sys 3.4 (1.2s) #> v Installed digest 0.6.27 (1.5s) #> v Installed storr 1.2.5 (1.7s) #> v Installed uuid 0.1-4 (1.4s) #> v Installed openssl 1.4.4 (2s) #> i Building context 0.3.0 #> v Installed cli 3.0.1 (6.5s) #> v Installed dplyr 1.0.7 (7s) #> v Installed ellipsis 0.3.2 (7.1s) #> v Installed fansi 0.5.0 (7.1s) #> v Installed generics 0.1.0 (7.2s) #> v Installed glue 1.4.2 (7.2s) #> v Installed lifecycle 1.0.0 (7.2s) #> v Installed pkgconfig 2.0.3 (1.2s) #> v Installed purrr 0.3.4 (1.4s) #> v Installed magrittr 2.0.1 (2s) #> v Built context 0.3.0 (4.5s) #> v Installed pillar 1.6.2 (2.3s) #> v Installed rlang 0.4.11 (2s) #> v Installed tibble 3.1.3 (2.1s) #> v Installed tidyselect 1.1.1 (2.2s) #> v Installed utf8 1.2.2 (1.5s) #> v Installed RColorBrewer 1.1-2 (1.3s) #> v Installed vctrs 0.3.8 (1.8s) #> v Installed context 0.3.0 (1.8s) #> v Installed farver 2.1.0 (907ms) #> v Installed gtable 0.3.0 (876ms) #> v Installed labeling 0.4.2 (797ms) #> v Installed ggplot2 3.3.5 (1.6s) #> v Installed munsell 0.5.0 (1.1s) #> v Installed isoband 0.2.5 (1.6s) #> v Installed viridisLite 0.4.0 (1.2s) #> v Installed scales 1.1.1 (1.4s) #> v Installed withr 2.4.2 (1.3s) #> v Installed colorspace 2.0-2 (3s) #> v Summary: 37 new 5 kept in 1m 38.6s #> Done! ``` What happened above was when the queue started up it looked to see what packages were available (none were) and then installed everything needed to run your jobs. That includes the two packages listed above but also all their dependencies and [`context`](https://mrc-ide.github.io/context/) which `didehpc` uses to send the jobs back and forth. All these packages are installed into a special directory within the context root: ```r dir(file.path(root, "lib/windows", as.character(getRversion()[1, 1:2]))) #> [1] "R6" "RColorBrewer" "_cache" "askpass" "cli" #> [6] "colorspace" "context" "crayon" "digest" "dplyr" #> [11] "ellipsis" "fansi" "farver" "generics" "ggplot2" #> [16] "glue" "gtable" "ids" "isoband" "labeling" #> [21] "lifecycle" "magrittr" "munsell" "openssl" "pillar" #> [26] "pkgconfig" "purrr" "rlang" "scales" "storr" #> [31] "sys" "tibble" "tidyselect" "utf8" "uuid" #> [36] "vctrs" "viridisLite" "withr" ``` Everything in this library will be available to your R jobs when they run. ## Everything is available in a CRAN-like repo We keep many often-used packages in a semi-stable repository (see the [mrc-ide drat](https://mrc-ide.github.io/drat/), the [ncov drat](https://ncov-ic.github.io/drat/) and the more experimental [R-universe](https://mrc-ide.r-universe.dev/ui#builds) system that is being developed to support this sort of workflow in future). To tell `didehpc` to look in one of these repositories when installing, create a `conan::conan_sourcs` object and list additional repositories as the `repos` argument, and pass this object in as the `package_sources` argument to `context_save`. Here, we add the mrc-ide drat repository and install the `dde` package; this will use the development version which is often ahead of the CRAN version. ```r src <- conan::conan_sources(NULL, repos = "https://mrc-ide.github.io/drat/") ctx <- context::context_save(root, packages = "dde", package_sources = src) #> [ open:db ] rds #> [ save:id ] 04dc80e37df65719c6b2ebfd79acfba8 #> [ save:name ] electrometrical_weasel ``` Create the library as before, and `dde` will be installed ```r obj <- didehpc::queue_didehpc(ctx) #> Loading context 04dc80e37df65719c6b2ebfd79acfba8 #> [ context ] 04dc80e37df65719c6b2ebfd79acfba8 #> [ library ] dde #> [ namespace ] #> [ source ] #> Running installation script on cluster #> ,:\ /:. #> // \_()_/ \\ #> || | | || CONAN THE LIBRARIAN #> || | | || Library: Q:\didehpc\20210817-145020\pkgs\lib\windows\4.0 #> || |____| || Bootstrap: T:\conan\bootstrap\4.0 #> \\ / || \ // Cache: Q:\didehpc\20210817-145020\pkgs\conan\cache/pkg #> `:/ || \;' Policy: lazy #> || Repos: #> || * https://mrc-ide.github.io/drat/ #> XX * https://cloud.r-project.org #> XX * https://mrc-ide.github.io/didehpc-pkgs #> XX Packages: #> XX * dde #> OO #> `' #> i Loading metadata database #> v Loading metadata database ... done #> i Getting 1 pkg (446.12 kB), 1 cached #> v Got dde 1.0.3 (source) (180.79 kB) #> v Got ring 1.0.3 (windows) (446.12 kB) #> v Installed ring 1.0.3 (563ms) #> i Building dde 1.0.3 #> v Built dde 1.0.3 (26.6s) #> v Installed dde 1.0.3 (391ms) #> v Summary: 2 new 1 kept in 27.6s #> Done! ``` If you want to add your packages to one of these repositories, please talk to Rich. You will need to increase your version number at each change (typically each merge into main/master) for the installation to notice that you have made changes. ## Install packages directly from GitHub (or similar) We use [`pkgdepends`](https://r-lib.github.io/pkgdepends/) as the engine for installing packages from exotic locations. This is a problem that is slightly more complicated than it seems because the resolution of the dependencies are not always unambiguous, particularly with networks of dependent packages. The basic idea is this. Suppose we want to install the [`rfiglet`](https://github.com/richfitz/rfiglet) package, which is not on CRAN. We use the "Remotes"-style reference `richfitz/rfiglet` as an entry to `conan_sources` so that `didehpc` knows where to install `rfiglet` from: ```r src <- conan::conan_sources("richfitz/rfiglet") ctx <- context::context_save(root, packages = "rfiglet", package_sources = src) #> [ open:db ] rds #> [ save:id ] e53cee7b36f20b6339f9ce2b92d9f0d8 #> [ save:name ] enharmonic_nautilus ``` Note that we still list `rfiglet` within the `packages` section of `context::context_save` as that is what is used to load the package. If you want to be even more explicit you can use `github::richfitz/rfiglet` as the reference, and you can add references such as `richfitz/rfiglet@d713c1b8` to point at a particular commit, branch or tag. ```r obj <- didehpc::queue_didehpc(ctx) #> Loading context e53cee7b36f20b6339f9ce2b92d9f0d8 #> [ context ] e53cee7b36f20b6339f9ce2b92d9f0d8 #> [ library ] rfiglet #> [ namespace ] #> [ source ] #> Running installation script on cluster #> ,:\ /:. #> // \_()_/ \\ #> || | | || CONAN THE LIBRARIAN #> || | | || Library: Q:\didehpc\20210817-145020\pkgs\lib\windows\4.0 #> || |____| || Bootstrap: T:\conan\bootstrap\4.0 #> \\ / || \ // Cache: Q:\didehpc\20210817-145020\pkgs\conan\cache/pkg #> `:/ || \;' Policy: lazy #> || Repos: #> || * https://cloud.r-project.org #> XX * https://mrc-ide.github.io/didehpc-pkgs #> XX Packages: #> XX * rfiglet #> XX * richfitz/rfiglet #> OO #> `' #> ! Using bundled GitHub PAT. Please add your own PAT using `gitcreds::gitcreds_set()`. #> i No downloads are needed, 1 pkg is cached #> v Got rfiglet 0.2.0 (source) (144.05 kB) #> i Packaging rfiglet 0.2.0 #> v Packaged rfiglet 0.2.0 (3.4s) #> i Building rfiglet 0.2.0 #> v Built rfiglet 0.2.0 (2.7s) #> v Installed rfiglet 0.2.0 (github::richfitz/rfiglet@d713c1b) (532ms) #> v Summary: 1 new in 3.2s #> Done! ``` ## Install private packages To install a private package, first make a local copy of the package somewhere on your system. Then you need to build a _source_ copy of this package (this will have a file extension of `tar.gz`). For example, suppose that the path `~/Documents/src/defer` contains a copy of your sources that you want to install, you could write: ```r path <- pkgbuild::build("~/Documents/src/defer", ".") #> checking for file ‘/home/rich/Documents/src/defer/DESCRIPTION’ ... ✔ checking for file ‘/home/rich/Documents/src/defer/DESCRIPTION’ #> ─ preparing ‘defer’: #> checking DESCRIPTION meta-information ... ✔ checking DESCRIPTION meta-information #> ─ checking for LF line-endings in source and make files and shell scripts #> ─ checking for empty or unneeded directories #> ─ building ‘defer_0.1.0.tar.gz’ #> #> ``` The second argument (`.`) is the directory that the built package will be created in. This must be in your working directory. You might find using something like `pkgs` as a destination helps keeps things tidy. (You may want to use the `vignettes = FALSE` argument to speed this process up if your package includes slow-to-run vignettes as they will be of no use on the cluster). ```r file.info(path) #> size isdir mode mtime ctime #> ./defer_0.1.0.tar.gz 3813 FALSE 755 2021-08-17 14:52:34 2021-08-17 14:52:34 #> atime uid gid uname grname #> ./defer_0.1.0.tar.gz 2021-08-17 14:52:34 1000 1000 rich rich ``` Then construct your package sources passing in the **relative** path to your package. We can use the `path` variable here, or you could write ./defer_0.1.0.tar.gz directly, or something like local::defer_0.1.0.tar.gz. If you have multiple packages you can pass a vector in. ```r src <- conan::conan_sources(path) ctx <- context::context_save(root, packages = "defer", package_sources = src) #> [ open:db ] rds #> [ save:id ] b1ee3dfcbfc8e8c455707746f13564cd #> [ save:name ] nonpoisonous_vulpesvulpes ``` when you construct the context, this package will be installed for you ```r obj <- didehpc::queue_didehpc(ctx) #> Loading context b1ee3dfcbfc8e8c455707746f13564cd #> [ context ] b1ee3dfcbfc8e8c455707746f13564cd #> [ library ] defer #> [ namespace ] #> [ source ] #> Running installation script on cluster #> ,:\ /:. #> // \_()_/ \\ #> || | | || CONAN THE LIBRARIAN #> || | | || Library: Q:\didehpc\20210817-145020\pkgs\lib\windows\4.0 #> || |____| || Bootstrap: T:\conan\bootstrap\4.0 #> \\ / || \ // Cache: Q:\didehpc\20210817-145020\pkgs\conan\cache/pkg #> `:/ || \;' Policy: lazy #> || Repos: #> || * https://cloud.r-project.org #> XX * https://mrc-ide.github.io/didehpc-pkgs #> XX Packages: #> XX * defer #> XX * local::./defer_0.1.0.tar.gz #> OO #> `' #> i No downloads are needed, 1 pkg is cached #> v Got defer 0.1.0 (source) (3.81 kB) #> i Building defer 0.1.0 #> v Built defer 0.1.0 (1.8s) #> v Installed defer 0.1.0 (local) (313ms) #> v Summary: 1 new in 2.1s #> Done! ``` ## Troubleshooting package installation ### Local copies You must have local copies of all packages installed (i.e., on the machine that is submitting the jobs). This is because we use some information about the packages to work out what can be run on the cluster. If you see a message like this when creating the queue object: ``` Loading context d1b3973bef7762b8d4d4ff5cbe090b2c [ context ] d1b3973bef7762b8d4d4ff5cbe090b2c [ library ] rfiglet Error in library(p, character.only = TRUE) : there is no package called ‘rfiglet’ ``` it means that you do not have the package installed *locally* and you should install it before continuing. ### File locking You cannot upgrade packages while you have cluster jobs running. The reason for this is [file locking](https://en.wikipedia.org/wiki/File_locking); any cluster job running has a copy of the package loaded and will prevent deletion. Unfortunately the installation will delete quite a lot of the package before it realises that it is locked, which causes all sorts of problems. Typically if you hit this you will see a "permission denied" error concerning a dll. Once this has happened you should be prepared for any queued jobs to fail. To avoid, if upgrading packages, use a new context root. ## More control over the process The package installation may seem a bit magic but you can tame it a little. When constructing your queue object, you can control how provisioning will occur with the `provision` argument. The default is to check to see if any packages listed in your context's `packages` argument are missing and only then do installation. If you pass `provision = "fake"` it will leave your library alone no matter what. Alternatively pass `provision = "upgrade"` to try and upgrade packages, or `provision = "later"` to skip this step for now. You can't submit jobs while your package installation looks incomplete. If you want to add additional things into the library without running the full provisioning (which might upgrade all sorts of things) you can use the `install_packages()` method on the object. This ignores the contents of your `conan_sources` and you pass directly in the `pkgdepends`-style references; see [the `pkgdepends` documentation](https://r-lib.github.io/pkgdepends/reference/pkg_refs.html) for the myriad options here. Examples of usage include: Install the latest version of a CRAN package ```r obj$install_packages("data.table") ``` Install a GitHub package ```r obj$install_packages("richfitz/stegasaur") ``` Install some local package from a `.tar.gz` file ```r obj$install_packages("local::mypkg_0.1.2.tar.gz") ``` You can possibly use this interface (along with `provision = "fake"`) to manipulate your package installation fairly flexibly. ## Installation failure / the wrong versions have been selected It is possible to end up in a situation where `pkgdepends` can't resolve your dependencies, or where in resolving dependencies an unwanted version of a package was installed. Please let Rich know with enough detail for him to reproduce the example himself: * A copy of the code that runs up to `didehpc::queue_didehpc(...)` covering things like `context::context_save()` and `conan::conan_sources()` * Copies of any manually built `.tar.gz` files that you are using * A full copy of the log