--- title: "Translating with traduire" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{traduire} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, error = FALSE) lang_output <- function(language, text) { writeLines(c(paste0("```", language), text, "```")) } tree <- function(path, header = path) { paste1 <- function(a, b) { paste(rep_len(a, length(b)), b) } indent <- function(x, files) { paste0(if (files) "| " else " ", x) } is_directory <- function(x) { unname(file.info(x)[, "isdir"]) } sort_files <- function(x) { i <- grepl("^[A-Z]", x) c(x[i], x[!i]) } prefix_file <- "|--=" prefix_dir <- "|-+=" files <- sort_files(dir(path)) files_full <- file.path(path, files) isdir <- is_directory(files_full) ret <- as.list(c(paste1(prefix_dir, files[isdir]), paste1(prefix_file, files[!isdir]))) files_full <- c(files_full[isdir], files_full[!isdir]) isdir <- c(isdir[isdir], isdir[!isdir]) n <- length(ret) if (n > 0) { ret[[n]] <- sub("|", "\\", ret[[n]], fixed = TRUE) tmp <- lapply(which(isdir), function(i) c(ret[[i]], indent(tree(files_full[[i]], NULL), !all(isdir)))) ret[isdir] <- tmp } c(header, unlist(ret)) } ``` The `traduire` R package provides a wrapper around the [`i18next` JavaScript library](https://i18next.com). It presents an alternative interface to R's built-in internationalisation functions, with a focus on the ability to change the target language within a session. Currently the package presents only a stripped down interface to the underlying library, though this may expand in future. First, prepare a `json` file with your translations. For example, the included file `examples/simple.json` contains: ```{r, results = "asis", echo = FALSE} path <- system.file("examples/simple.json", package = "traduire", mustWork = TRUE) lang_output("json", readLines(path)) ``` We can create a translator, setting the default language to English (en) as: ```{r} tr <- traduire::i18n(path, language = "en") tr ``` With this object we can perform translations with the `t` method by passing in a key from within our translations: ```{r} tr$t("hello") ``` Specify the `language` argument to change language: ```{r} tr$t("hello", language = "fr") ``` ## Interpolation String interpolation is done using a syntax very similar to [`glue`](https://cran.r-project.org/package=glue) (see the [`i18next` documentation](https://www.i18next.com/translation-function/interpolation)) ```{r} tr$t("interpolate", list(what = "i18next", how = "easy"), language = "en") tr$t("interpolate", list(what = "i18next", how = "facile"), language = "fr") ``` ## Pluralisation The example here is derived from a [web API](https://github.com/mrc-ide/hintr) that we developed the package to support. We wanted to, as a service, validate incoming data and return information back to user about what to fix; if the data is missing one or more columns we will report back the columns that they are missing. This requires different translations for the singular case ("Data missing column X") and plural ("Data missing **columns** X, Y"). The translation file looks like: ```{r, results = "asis", echo = FALSE} path_validation <- system.file("examples/validation.json", package = "traduire", mustWork = TRUE) lang_output("json", readLines(path_validation)) ``` where the `_plural` suffix is important for `i18next` for determining the string to return for a singular or plural case, and the `count` element determines if the string is singular or plural. Then we can use this as: ```{r} tr <- traduire::i18n(path_validation) ``` Pluralisation of results is supported using keys that include `_plural` suffix (see the [`i18next` documentation](https://www.i18next.com/translation-function/plurals)) and by passing a `count` argument in to the translation: ```{r} tr$t("nocols", list(missing = "A"), count = 1) tr$t("nocols", list(missing = "A, B"), count = 2) ``` or, changing the language: ```{r} tr$t("nocols", list(missing = "A"), count = 1, language = "fr") tr$t("nocols", list(missing = "A, B"), count = 2, language = "fr") ``` ## Fallback language To illustrate this feature, we use a list of translations of `Hello world!` which includes many languages. ```{r} path_hello <- system.file("hello/inst/traduire.json", package = "traduire") ``` Most simply, if we want to fall back onto a single language for all translations, we can provide a fallback language as a string: ```{r} tr <- traduire::i18n(path_hello, fallback = "it") tr$t("hello", language = "unknown") ``` Alternatively, a chain of languages to try can be provided: ```{r} tr <- traduire::i18n(path_hello, fallback = c("a", "b", "de")) tr$t("hello", language = "unknown") ``` If you want to have different fallback languages for different target languages, provide a named list of mappings (each of which can be a scalar or vector of fallback languages as above): ```{r} tr <- traduire::i18n(path_hello, fallback = list(co = "fr", "default" = "en")) tr$t("hello", language = "co") tr$t("hello", language = "unknown") ``` ## Translating multiple keys in a block of text The motivating use case we had was translating a json file for use in an [upstream web application](http://github.com/mrc-ide/hint), so the text to translate might contain data like: ```json { "id": "area_scope", "label": "element_label", "type": "multiselect", "description": "element_description" } ``` where the json contains a mix of elements to be internationalised (such as the values of `label` and `description`) and elements to be left as-is (such as the values of `id` and `type`). The snippet above is a simplified version of the [full data](https://github.com/mrc-ide/naomi/blob/178fc4a955dd37e5bc39783afa68cc02e818ce51/inst/metadata/general_run_options.json) where the values to translate might occur at any depth within the json. To support this, the `i18n` object has a `replace` method, which performs string replacement of text wrapped in `t_(...)`. So we rewrite our json: ```{r} string <- '{ "id": "area_scope", "label": "t_(element_label)", "type": "multiselect", "description": "t_(element_description)" }' ``` and we provide a set of translations: ```{r} translations <- '{ "en": { "translation": { "element_label": "Country", "element_description": "Select your countries" } }, "fr": { "translation": { "element_label": "Payes", "element_description": "Sélectionnez vos payes" } } }' ``` We construct a translator object with these translations: ```{r} tr <- traduire::i18n(translations) ``` We can then use the `replace` method to translate all strings (wrapped here in `writeLines` to make it easier to read with all json quotes: ```{r} writeLines(tr$replace(string)) ``` or, into French: ```{r} writeLines(tr$replace(string, language = "fr")) ``` Note that while the input text here is json, it could be anything at all, and will not be parsed as json. ## Use within a package We provide an optional workflow for using translations within a package, or some other piece of code where the translations will be fairly invasive to add, allowing you to write essentially: ```r traduire::t_(...) ``` and have all the `...` arguments forwarded to the appropriate translator object. There are several details here: * how do we determine what is the "appropriate" translator object * how do we determine what language is active for this translation? To do this, we allow packages (or other similar code) to "register" a translator, like ```r traduire::translator_register(resources) ``` where `resources` is passed to `traduire::i18n`. Here we show a complete example package that implements "hello-world-as-a-service" - i.e., a small web service that will reply with a version of "Hello world!" translated into the client's choice. The full package is included as an example within `traduire` at `system.file("hello", package = "traduire")` and is ```{r, results = "asis", echo = FALSE} path_hello <- system.file("hello", package = "traduire", mustWork = TRUE) lang_output("plain", tree(path_hello, "hello")) ``` Below is the code in `hello.R`, which can say rough translations of "hello world" in a number of languages: ```{r, results = "asis", echo = FALSE} content <- readLines(file.path(path_hello, "R", "hello.R")) lang_output("r", content[!grepl("^#", content)]) ``` Here, * `hello` is a simple function that does no translation * `world` is a function that translates with an explicit language argument, but finds the translations automagically * `monde` is a function that translates and finds both the translations and the language automagically The `.onLoad` function contains a call to `traduire::translator_register` which registers a translator database for the package. All calls to `t_` that come from this package will use this registered translator. Why would we want to do this? If we were using [plumber](https://www.rplumber.io/) to build an API we might want to allow the requests to come in with a header indicating the language. Our plumber api might look like: ```{r, results = "asis", echo = FALSE} content <- readLines(file.path(path_hello, "inst", "plumber.R")) lang_output("r", content) ``` The first endpoint inspects the endpoint's `req` object to get the requested language, but the second gets it automagically. This can be understood by looking at the code used to run the API: ```{r, results = "asis", echo = FALSE} content <- readLines(file.path(path_hello, "R", "api.R")) lang_output("r", content[!grepl("^#", content)]) ``` So at the beginning of each api request we are calling `traduire::translator_set_language`, which affects only this package as a ["preroute" hook](https://www.rplumber.io/docs/programmatic-usage.html#router-hooks) and resetting this in the "postserialize" hook. The full package is available at `system.file("hello", package = "traduire")`. If you run the API, it can be used like: ``` $ curl -H "Accept-Language: fr" http://localhost:8888 ----- Salut le monde ! ------ \ ^__^ \ (oo)\ ________ (__)\ )\ /\ ||------w| || || $ curl -H "Accept-Language: en" http://localhost:8888 ----- Hello world! ------ \ ^__^ \ (oo)\ ________ (__)\ )\ /\ ||------w| || || $ curl -H "Accept-Language: ko" http://localhost:8888/hello/cat -------------- 반갑다 세상아 -------------- \ \ \ |\___/| ==) ^Y^ (== \ ^ / )=*=( / \ | | /| | | |\ \| | |_|/\ jgs //_// ___/ \_) ``` ## Namespaces, and the structure of translation files This section outlines how to write the translation (json) files, alongside a discussion of using namespaces. Consider again the first example: ```{r, results = "asis", echo = FALSE} path <- system.file("examples/simple.json", package = "traduire", mustWork = TRUE) lang_output("json", readLines(path)) ``` In this format, the top level keys (`en`, `fr`) represent _languages_ and the next level key (`translation`) which appears redundant represents a [_namespace_](https://www.i18next.com/principles/namespaces). A translation set can have multiple namespaces, which can help with organising a large set of strings, and can be used to split the file up over smaller files that might be easier to work with (see below). Below, we have file with two namespaces, `common` and `login`. These might represent strings used throughout the application and in a login component, for example. ```{r, results = "asis", echo = FALSE} path <- system.file("examples/namespaces.json", package = "traduire", mustWork = TRUE) lang_output("json", readLines(path)) ``` When constructing the translator object we can provide a default namespace (it defaults to `translation`): ```{r} tr <- traduire::i18n(path, default_namespace = "common") ``` Keys that are provided without an explicit namespace, will be looked up in the default namespace: ```{r} tr$t("hello") ``` or provide a namespace when looking up keys: ```{r} tr$t("common:hello", language = "fr") tr$t("login:username", language = "fr") ``` So far, this brings relatively little advantage as our file, while structured, is still going to end up really large as all the files end up in it. So we might want to break it up like so: ```{r, results = "asis", echo = FALSE} path <- system.file("examples/structured", package = "traduire", mustWork = TRUE) lang_output("plain", tree(path, "structured")) ``` where each file is orgnanised like: ```{r, results = "asis", echo = FALSE} lang_output("json", readLines(file.path(path, "en-login.json"))) ``` to allow this, we need to load the files one by one into the translation object, rather than as a single resource bundle. To do this, we can use the `add_resource_bundle` method: ```{r} obj <- traduire::i18n(NULL) obj$add_resource_bundle("en", "common", file.path(path, "en-common.json")) obj$add_resource_bundle("en", "login", file.path(path, "en-login.json")) obj$add_resource_bundle("fr", "common", file.path(path, "fr-common.json")) obj$add_resource_bundle("fr", "login", file.path(path, "fr-login.json")) obj$t("login:password", language = "fr") ``` This is clearly going to be error prone to do with a large number of translation files, though a loop could help: ```{r} obj <- traduire::i18n(NULL) for (language in c("en", "fr")) { for (namespace in c("common", "login")) { bundle <- file.path(path, sprintf("%s-%s.json", language, namespace)) obj$add_resource_bundle(language, namespace, bundle) } } obj$t("login:password", language = "fr") ``` An alternative is to pass in the pattern used to locate these files, though this approach works best if you also declare your namespaces and languages up front. The pattern uses glue's syntax, and the pattern must include placeholders `language` and `namespace` (and no others): ```{r} pattern <- file.path(path, "{language}-{namespace}.json") obj <- traduire::i18n(NULL, resource_pattern = pattern, languages = c("en", "fr"), namespaces = c("common", "login")) obj$t("login:password", language = "fr") ```