Translating with traduire

The traduire R package provides a wrapper around the i18next JavaScript library. It presents an alternative interface to R’s built-in internationalisation functions, with a focus on the ability to change the target language within a session. Currently the package presents only a stripped down interface to the underlying library, though this may expand in future.

First, prepare a json file with your translations. For example, the included file examples/simple.json contains:

{
    "en": {
        "translation": {
            "hello": "hello world",
            "query": "how are you?",
            "interpolate": "{{what}} is {{how}}"
        }
    },
    "fr": {
        "translation": {
            "hello": "bonjour le monde",
            "query": "ça va ?",
            "interpolate": "{{what}} est {{how}}"
        }
    }
}

We can create a translator, setting the default language to English (en) as:

tr <- traduire::i18n(path, language = "en")
tr
## <i18n>
##   Public:
##     add_resource_bundle: function (language, namespace, resources, deep = FALSE, overwrite = FALSE) 
##     default_namespace: function () 
##     exists: function (string, data = NULL, language = NULL, count = NULL, 
##     get_resource: function (language, namespace, key, sep = ".") 
##     has_resource_bundle: function (language, namespace) 
##     initialize: function (resources, options) 
##     language: function () 
##     languages: function () 
##     load_languages: function (languages) 
##     load_namespaces: function (namespaces) 
##     options: function () 
##     replace: function (text, ...) 
##     set_default_namespace: function (namespace) 
##     set_language: function (language) 
##     t: function (string, data = NULL, language = NULL, count = NULL, 
##   Private:
##     context: V8, environment

With this object we can perform translations with the t method by passing in a key from within our translations:

tr$t("hello")
## [1] "hello world"

Specify the language argument to change language:

tr$t("hello", language = "fr")
## [1] "bonjour le monde"

Interpolation

String interpolation is done using a syntax very similar to glue (see the i18next documentation)

tr$t("interpolate", list(what = "i18next", how = "easy"),
     language = "en")
## [1] "i18next is easy"
tr$t("interpolate", list(what = "i18next", how = "facile"),
     language = "fr")
## [1] "i18next est facile"

Pluralisation

The example here is derived from a web API that we developed the package to support. We wanted to, as a service, validate incoming data and return information back to user about what to fix; if the data is missing one or more columns we will report back the columns that they are missing. This requires different translations for the singular case (“Data missing column X”) and plural (“Data missing columns X, Y”).

The translation file looks like:

{
    "en": {
        "translation": {
            "nocols": "Data missing column {{missing}}",
            "nocols_plural": "Data missing columns {{missing}}"
        }
    },

    "fr": {
        "translation": {
            "nocols": "Les données sont manquantes colonne {{missing}}",
            "nocols_plural": "Les données sont manquantes colonnes {{missing}}"
        }
    }
}

where the _plural suffix is important for i18next for determining the string to return for a singular or plural case, and the count element determines if the string is singular or plural.

Then we can use this as:

tr <- traduire::i18n(path_validation)

Pluralisation of results is supported using keys that include _plural suffix (see the i18next documentation) and by passing a count argument in to the translation:

tr$t("nocols", list(missing = "A"), count = 1)
## [1] "Data missing column A"
tr$t("nocols", list(missing = "A, B"), count = 2)
## [1] "Data missing columns A, B"

or, changing the language:

tr$t("nocols", list(missing = "A"), count = 1, language = "fr")
## [1] "Les données sont manquantes colonne A"
tr$t("nocols", list(missing = "A, B"), count = 2, language = "fr")
## [1] "Les données sont manquantes colonnes A, B"

Fallback language

To illustrate this feature, we use a list of translations of Hello world! which includes many languages.

path_hello <- system.file("hello/inst/traduire.json", package = "traduire")

Most simply, if we want to fall back onto a single language for all translations, we can provide a fallback language as a string:

tr <- traduire::i18n(path_hello, fallback = "it")
tr$t("hello", language = "unknown")
## [1] "Ciao Mondo!"

Alternatively, a chain of languages to try can be provided:

tr <- traduire::i18n(path_hello, fallback = c("a", "b", "de"))
tr$t("hello", language = "unknown")
## [1] "Hallo Welt!"

If you want to have different fallback languages for different target languages, provide a named list of mappings (each of which can be a scalar or vector of fallback languages as above):

tr <- traduire::i18n(path_hello, fallback = list(co = "fr", "default" = "en"))
tr$t("hello", language = "co")
## [1] "Bonjour le Monde!"
tr$t("hello", language = "unknown")
## [1] "Hello World!"

Translating multiple keys in a block of text

The motivating use case we had was translating a json file for use in an upstream web application, so the text to translate might contain data like:

{
  "id": "area_scope",
  "label": "element_label",
  "type": "multiselect",
  "description": "element_description"
}

where the json contains a mix of elements to be internationalised (such as the values of label and description) and elements to be left as-is (such as the values of id and type). The snippet above is a simplified version of the full data where the values to translate might occur at any depth within the json.

To support this, the i18n object has a replace method, which performs string replacement of text wrapped in t_(...). So we rewrite our json:

string <- '{
  "id": "area_scope",
  "label": "t_(element_label)",
  "type": "multiselect",
  "description": "t_(element_description)"
}'

and we provide a set of translations:

translations <- '{
    "en": {
        "translation": {
            "element_label": "Country",
            "element_description": "Select your countries"
        }
    },
    "fr": {
        "translation": {
            "element_label": "Payes",
            "element_description": "Sélectionnez vos payes"
        }
    }
}'

We construct a translator object with these translations:

tr <- traduire::i18n(translations)

We can then use the replace method to translate all strings (wrapped here in writeLines to make it easier to read with all json quotes:

writeLines(tr$replace(string))
## {
##   "id": "area_scope",
##   "label": "Country",
##   "type": "multiselect",
##   "description": "Select your countries"
## }

or, into French:

writeLines(tr$replace(string, language = "fr"))
## {
##   "id": "area_scope",
##   "label": "Payes",
##   "type": "multiselect",
##   "description": "Sélectionnez vos payes"
## }

Note that while the input text here is json, it could be anything at all, and will not be parsed as json.

Use within a package

We provide an optional workflow for using translations within a package, or some other piece of code where the translations will be fairly invasive to add, allowing you to write essentially:

traduire::t_(...)

and have all the ... arguments forwarded to the appropriate translator object. There are several details here:

  • how do we determine what is the “appropriate” translator object
  • how do we determine what language is active for this translation?

To do this, we allow packages (or other similar code) to “register” a translator, like

traduire::translator_register(resources)

where resources is passed to traduire::i18n.

Here we show a complete example package that implements “hello-world-as-a-service” - i.e., a small web service that will reply with a version of “Hello world!” translated into the client’s choice.

The full package is included as an example within traduire at system.file("hello", package = "traduire") and is

hello
|-+= R
| |--= api.R
| \--= hello.R
|-+= inst
| |--= README.md
| |--= plumber.R
| \--= traduire.json
|-+= man
| |--= api.Rd
| \--= hello.Rd
|--= DESCRIPTION
|--= LICENSE
\--= NAMESPACE

Below is the code in hello.R, which can say rough translations of “hello world” in a number of languages:

hello <- function(...) {
  cowsay::say("Hello", "cow", ...)
}

world <- function(language = "en", ...) {
  cowsay::say(t_("hello", language = language), "cow", ...)
}

monde <- function(...) {
  cowsay::say(t_("hello"), ...)
}

.onLoad <- function(...) {
  path <- system.file("traduire.json", package = "hello", mustWork = TRUE)
  traduire::translator_register(path, "en")
}

Here,

  • hello is a simple function that does no translation
  • world is a function that translates with an explicit language argument, but finds the translations automagically
  • monde is a function that translates and finds both the translations and the language automagically

The .onLoad function contains a call to traduire::translator_register which registers a translator database for the package. All calls to t_ that come from this package will use this registered translator.

Why would we want to do this? If we were using plumber to build an API we might want to allow the requests to come in with a header indicating the language. Our plumber api might look like:

#' @get /
#' @html
function(res, req) {
  language <- as.list(req$HEADERS)[["accept-language"]]
  paste0(hello::world(language, type = "string"), "\n")
}

#' @get /hello/<animal>
#' @html
function(res, req, animal) {
  paste0(hello::monde(by = animal, type = "string"), "\n")
}

The first endpoint inspects the endpoint’s req object to get the requested language, but the second gets it automagically. This can be understood by looking at the code used to run the API:

api <- function(port = 8888) {
  path <- system.file("plumber.R", package = "hello", mustWork = TRUE)
  pr <- plumber::plumb(path)
  pr$registerHook("preroute", api_set_language)
  pr$registerHook("postserialize", api_reset_language)
  pr$run(port = port)
}

api_set_language <- function(data, req, res) {
  if ("accept-language" %in% names(req$HEADERS)) {
    language <- req$HEADERS[["accept-language"]]
    data$reset_language <- traduire::translator_set_language(language)
  }
}

api_reset_language <- function(data, req, res, value) {
  if (!is.null(data$reset_language)) {
    data$reset_language()
  }
  value
}

So at the beginning of each api request we are calling traduire::translator_set_language, which affects only this package as a “preroute” hook and resetting this in the “postserialize” hook.

The full package is available at system.file("hello", package = "traduire"). If you run the API, it can be used like:

$ curl -H "Accept-Language: fr" http://localhost:8888

 -----
Salut le monde !
 ------
    \   ^__^
     \  (oo)\ ________
        (__)\         )\ /\
             ||------w|
             ||      ||
$ curl -H "Accept-Language: en" http://localhost:8888

 -----
Hello world!
 ------
    \   ^__^
     \  (oo)\ ________
        (__)\         )\ /\
             ||------w|
             ||      ||

$ curl -H "Accept-Language: ko" http://localhost:8888/hello/cat

 --------------
반갑다 세상아
 --------------
    \
      \
        \
            |\___/|
          ==) ^Y^ (==
            \  ^  /
             )=*=(
            /     \
            |     |
           /| | | |\
           \| | |_|/\
      jgs  //_// ___/
               \_)

Namespaces, and the structure of translation files

This section outlines how to write the translation (json) files, alongside a discussion of using namespaces. Consider again the first example:

{
    "en": {
        "translation": {
            "hello": "hello world",
            "query": "how are you?",
            "interpolate": "{{what}} is {{how}}"
        }
    },
    "fr": {
        "translation": {
            "hello": "bonjour le monde",
            "query": "ça va ?",
            "interpolate": "{{what}} est {{how}}"
        }
    }
}

In this format, the top level keys (en, fr) represent languages and the next level key (translation) which appears redundant represents a namespace. A translation set can have multiple namespaces, which can help with organising a large set of strings, and can be used to split the file up over smaller files that might be easier to work with (see below).

Below, we have file with two namespaces, common and login. These might represent strings used throughout the application and in a login component, for example.

{
    "en": {
        "common": {
            "hello": "hello world"
        },
        "login": {
            "username": "Username",
            "password": "Password"
        }
    },
    "fr": {
        "common": {
            "hello": "salut le monde"
        },
        "login": {
            "username": "Nom d'utilisateur",
            "password": "Mot de passe"
        }
    }
}

When constructing the translator object we can provide a default namespace (it defaults to translation):

tr <- traduire::i18n(path, default_namespace = "common")

Keys that are provided without an explicit namespace, will be looked up in the default namespace:

tr$t("hello")
## [1] "hello world"

or provide a namespace when looking up keys:

tr$t("common:hello", language = "fr")
## [1] "salut le monde"
tr$t("login:username", language = "fr")
## [1] "Nom d'utilisateur"

So far, this brings relatively little advantage as our file, while structured, is still going to end up really large as all the files end up in it. So we might want to break it up like so:

structured
|--= en-common.json
|--= en-login.json
|--= fr-common.json
\--= fr-login.json

where each file is orgnanised like:

{
    "username": "Username",
    "password": "Password"
}

to allow this, we need to load the files one by one into the translation object, rather than as a single resource bundle. To do this, we can use the add_resource_bundle method:

obj <- traduire::i18n(NULL)
obj$add_resource_bundle("en", "common", file.path(path, "en-common.json"))
obj$add_resource_bundle("en", "login", file.path(path, "en-login.json"))
obj$add_resource_bundle("fr", "common", file.path(path, "fr-common.json"))
obj$add_resource_bundle("fr", "login", file.path(path, "fr-login.json"))
obj$t("login:password", language = "fr")
## [1] "Mot de passe"

This is clearly going to be error prone to do with a large number of translation files, though a loop could help:

obj <- traduire::i18n(NULL)
for (language in c("en", "fr")) {
  for (namespace in c("common", "login")) {
    bundle <- file.path(path, sprintf("%s-%s.json", language, namespace))
    obj$add_resource_bundle(language, namespace, bundle)
  }
}
obj$t("login:password", language = "fr")
## [1] "Mot de passe"

An alternative is to pass in the pattern used to locate these files, though this approach works best if you also declare your namespaces and languages up front. The pattern uses glue’s syntax, and the pattern must include placeholders language and namespace (and no others):

pattern <- file.path(path, "{language}-{namespace}.json")
obj <- traduire::i18n(NULL, resource_pattern = pattern,
                      languages = c("en", "fr"),
                      namespaces = c("common", "login"))
obj$t("login:password", language = "fr")
## [1] "Mot de passe"