
Interactively provide translations for a package's messages
Source:R/translate_package.R
translate_package.RdThis function handles the "grunt work" of building and updating translation libraries. In addition to providing a friendly interface for supplying translations, some internal logic is built to help make your package more translation-friendly.
To get started, the package developer should run translate_package() on
your package's source to produce a template .pot file (or files, if your
package has both R and C/C++ messages to translated), e.g.
To add translations in your desired language, include the target language:
in the translate_package(languages = "es") call.
Usage
translate_package(
dir = ".",
languages = NULL,
diagnostics = list(check_cracked_messages, check_untranslated_cat,
check_untranslated_src),
custom_translation_functions = list(R = NULL, src = NULL),
max_translations = Inf,
use_base_rules = package %chin% .potools$base_package_names,
copyright = NULL,
bugs = "",
verbose = !is_testing()
)Arguments
- dir
Character, default the present directory; a directory in which an R package is stored.
- languages
Character vector; locale codes to which to translate. Must be a valid language accepted by gettext. This almost always takes the form of (1) an ISO 639 2-letter language code; or (2)
ll_CC, wherellis an ISO 639 2-letter language code andCCis an ISO 3166 2-letter country code e.g.esfor Spanish,es_ARfor Argentinian Spanish,rofor Romanian, etc. Seebase::Sys.getlocale()for some helpful tips about how to tell which locales are currently available on your machine, and see the References below for some web resources listing more locales.- diagnostics
A
listof diagnostic functions to be run on the package's message data. See Details.- custom_translation_functions
A
listwith either/both of two components,Randsrc, together governing how to extract any non-standard strings from the package. See Details.- max_translations
Numeric; used for setting a cap on the number of translations to be done for each language. Defaults to
Inf, meaning all messages in the package.- use_base_rules
Logical; Should internal behavior match base behavior as strictly as possible?
TRUEif being run on a base package (i.e.,baseor one of the default packages likeutils,graphics, etc.). See Details.- copyright
Character; passed on to
write_po_file().- bugs
Character; passed on to
write_po_file().- verbose
Logical, default
TRUE(except during testing). Should extra information about progress, etc. be reported?
Value
This function returns nothing invisibly. As a side effect, a
.pot file is written to the package's po directory (updated if
one does not yet exist, or created from scratch otherwise), and a .po
file is written in the same directory for each element of languages.
Phases
translate_package() goes through roughly three "phases" of translation.
Setup --
diris checked for existing translations (toggling between "update" and "new" modes), and R files are parsed and combed for user-facing messages.Diagnostics: see the Diagnostics section below. Any diagnostic detecting "unhealthy" messages will result in a yes/no prompt to exit translation to address the issues before continuing.
Translation. All of the messages found in phase one are iterated over -- the user is shown a message in English and prompted for the translation in the target language. This process is repeated for each domain in
languages.
An attempt is made to provide hints for some translations that require
special care (e.g. that have escape sequences or use templates). For
templated messages (e.g., that use %s), the user-provided message
must match the templates of the English message. The templates don't
have to be in the same order -- R understands template reordering, e.g.
%2$s says "interpret the second input as a string". See
sprintf() for more details.
After each language is completed, a corresponding .po file is written
to the package's po directory (which is created if it does not yet
exist).
There are some discrepancies in the default behavior of
translate_package and the translation workflow used to generate the
.po/.pot files for R itself (mainly, the suite of functions
from tools, tools::update_pkg_po(),
tools::xgettext2pot(), tools::xgettext(), and
tools::xngettext()). They should only be superficial (e.g.,
whitespace or comments), but nevertheless may represent a barrier to
smoothly submitting patchings to R Core. To make the process of translating
base R and the default packages (tools, utils, stats,
etc.) as smooth as possible, set the use_base_rules argument to
TRUE and your resulting .po/.pot/.mo file will
match base's.
Custom translation functions
base R provides several functions for messaging that are natively equipped
for translation (they all have a domain argument): stop(), warning(),
message(), gettext(), gettextf(), ngettext(), and
packageStartupMessage().
While handy, some developers may prefer to write their own functions, or to
write wrappers of the provided functions that provide some enhanced
functionality (e.g., templating or automatic wrapping). In this case,
the default R tooling for translation (xgettext(), xngettext()
xgettext2pot(), update_pkg_po() from tools) will not work, but
translate_package() and its workhorse get_message_data() provide an
interface to continue building translations for your workflow.
Suppose you wrote a function stopf() that is a wrapper of
stop(gettextf()) used to build templated error messages in R, which makes
translation easier for translators (see below), e.g.:
stopf = function(fmt, ..., domain = NULL) {
stop(gettextf(fmt, ...), domain = domain, call. = FALSE)
}Note that potools itself uses just such a wrapper internally to build
error messages! To extract strings from calls in your package to stopf()
and mark them for translation, use the argument
custom_translation_functions:
get_message_data(
'/path/to/my_package',
custom_translation_functions = list(R = 'stopf:fmt|1')
)This invocation tells get_message_data() to look for strings in the
fmt argument in calls to stopf(). 1 indicates that fmt is the
first argument.
This interface is inspired by the --keyword argument to the
xgettext command-line tool. This argument consists of a list with two
components, R and src (either can be excluded), owing to
differences between R and C/C++. Both components, if present, should consist
of a character vector.
For R, there are two types of input: one for named arguments, the other for unnamed arguments.
Entries for named arguments will look like
"fname:arg|num"(singular string) or"fname:arg1|num1,arg2|num2"(plural string).fnamegives the name of the function/call to be extracted from the R source,arg/arg1/arg2specify the name of the argument tofnamefrom which strings should be extracted, andnum/num1/num2specify the order of the named argument within the signature offname.Entries for unnamed arguments will look like
"fname:...\xarg1,...,xargn", i.e.,fname, followed by:, followed by...(three dots), followed by a backslash (\), followed by a comma-separated list of argument names. All strings within calls tofnameexcept those supplied to the arguments named amongxarg1, ...,xargnwill be extracted.
To clarify, consider the how we would (redundantly) specify
custom_translation_functions for some of the default messagers,
gettext, gettextf, and ngettext:
custom_translation_functions = list(R = c("gettext:...\domain", "gettextf:fmt|1", "ngettext:msg1|2,msg2|3")).
For src, there is only one type of input, which looks like
"fname:num", which says to look at the num argument of calls
to fname for char arrays.
Note that there is a difference in how translation works for src vs. R -- in
R, all strings passed to certain functions are considered marked for
translations, but in src, all translatable strings must be explicitly marked
as such. So for src translations, custom_translation_functions
is not used to customize which strings are marked for translation, but
rather, to expand the set of calls which are searched for potentially
untranslated arrays (i.e., arrays passed to the specified calls that
are not explicitly marked for translation). These can then be reported in
the check_untranslated_src() diagnostic, for example.
Diagnostics
Cracked messages
A cracked message is one like:
stop("There are ", n, " good things and ", m, " bad things.")In its current state, translators will be asked to translate three messages independently:
"There are"
"good things and"
"bad things."
The message has been cracked; it might not be possible to translate a string as generic as "There are" into many languages -- context is key!
To keep the context, the error message should instead be build with
gettextf like so:
Now there is only one string to translate! Note that this also allows the translator to change the word order as they see fit -- for example, in Japanese, the grammatical order usually puts the verb last (where in English it usually comes right after the subject).
translate_package detects such cracked messages and suggests a
gettextf-based approach to fix them.
Untranslated R messages produced by cat()
Only strings which are passed to certain base functions are eligible for
translation, namely stop, warning, message, packageStartupMessage,
gettext, gettextf, and ngettext (all of which have a domain argument
that is key for translation).
However, it is common to also produce some user-facing messages using
cat -- if your package does so, it must first use gettext or gettextf
to translate the message before sending it to the user with cat.
translate_package detects strings produced with cat and suggests a
gettext- or gettextf-based fix.
Untranslated C/C++ messages
This diagnostic detects any literal char arrays provided to common
messaging functions in C/C++, namely ngettext(), Rprintf(), REprintf(),
Rvprintf(), REvprintf(), R_ShowMessage(), R_Suicide(), warning(),
Rf_warning(), error(), Rf_error(), dgettext(), and snprintf().
To actually translate these strings, pass them through the translation
macro _.
NB: Translation in C/C++ requires some additional #includes and
declarations, including defining the _ macro.
See the Internationalization section of Writing R Extensions for details.
Custom diagnostics
A diagnostic is a function which takes as input a data.table
summarizing the translatable strings in a package (e.g. as generated by
get_message_data()), evaluates whether these messages are
"healthy" in some sense, and produces a digest of "unhealthy" strings and
(optionally) suggested replacements.
The diagnostic function must have an attribute named diagnostic_tag
that describes what the diagnostic does; it is reproduced in the format
Found {nrow(result)} {diagnostic_tag}:. For example,
check_untranslated_cat() has diagnostic_tag = "untranslated messaging calls passed through cat()".
The output diagnostic result has the following schema:
call:character, the call identified as problematicfile:character, the file wherecallwas foundline_number:integer, the line infilewherecallwas foundreplacement:character, optional, a suggested fix to make the call "healthy"
See check_cracked_messages(),
check_untranslated_cat(), and
check_untranslated_src() for examples of diagnostics.
References
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Internationalization
https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Internationalization
https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Internationalization-in-the-R-sources
https://developer.r-project.org/Translations30.html
https://web.archive.org/web/20230108213934/https://www.isi-web.org/resources/glossary-of-statistical-terms
https://www.gnu.org/software/gettext/
https://www.gnu.org/software/gettext/manual/html_node/Usual-Language-Codes.html#Usual-Language-Codes
https://www.gnu.org/software/gettext/manual/html_node/Country-Codes.html#Country-Codes
https://www.stats.ox.ac.uk/pub/Rtools/goodies/gettext-tools.zip
https://saimana.com/list-of-country-locale-code/
Examples
pkg <- system.file('pkg', package = 'potools')
# copy to a temporary location to be able to read/write/update below
tmp_pkg <- file.path(tempdir(), "pkg")
dir.create(tmp_pkg)
file.copy(pkg, dirname(tmp_pkg), recursive = TRUE)
#> [1] TRUE
# run translate_package() without any languages
# this will generate a .pot template file and en@quot translations (in UTF-8 locales)
# we can also pass empty 'diagnostics' to skip the diagnostic step
# (skip if gettext isn't available to avoid an error)
if (isTRUE(check_potools_sys_reqs)) {
translate_package(tmp_pkg, diagnostics = NULL)
}
if (FALSE) {
# launches the interactive translation dialog for translations into Estonian:
translate_package(tmp_pkg, "et_EE", diagnostics = NULL, verbose = TRUE)
}
# cleanup
unlink(tmp_pkg, recursive = TRUE)
rm(pkg, tmp_pkg)