potools
provides several “diagnostic” functions used to check the “health” of the messaging corpus available in a given package. These are check_cracked_messages
, which looks for messages split into chunks which are hard to translate; check_untranslated_cat
, which looks for messages displayed via cat()
which are not marked for translation; and check_untranslated_src
, which looks for messages in the src
directory which are not marked for translation.
These just crack the surface of the types of diagnostics that are possible for improving the quality of messaging to users – not only in the process of translation, but also for bettering the experience in English!
In this vignette we’ll demonstrate just such a use case by writing a custom diagnostic function that checks for typos in your messages by applying the function utils::aspell()
.
We’ll call our function check_spelling
; it will take as input a data.table
like that produced by get_message_data()
, and give as output a data.table
indexing any issues found. Specifically, it should have three or four columns: call
, file
, line_number
, and replacement
. The first three come directly from the input; the last one is optional and suggests to the user a way to repair any “unhealthy” messages.
check_spelling = function(message_data) {
# if aspell isn't installed, this won't work; be sure to return an object with the right schema anyway
if (!nzchar(Sys.which("aspell"))) {
warning("'aspell' is not installed; returning nothing")
return(message_data[0, .(call, file, line_number)])
}
# aspell() works on files, so we'll write the msgid to files
aspell_dir <- file.path(tempdir(), 'aspell')
dir.create(aspell_dir)
original_dir <- setwd(aspell_dir)
on.exit({
unlink(aspell_dir, recursive = TRUE)
setwd(original_dir)
})
# (!is_repeat) makes sure we only check duplicate messages once
# plural messages are in a list, so handle them separately
message_data[(!is_repeat), by = .(file, type), {
if (.BY$type == "singular") {
cat(msgid, file = .BY$file, sep = "\n")
# aspell() results has 5 columns: Original, File, Line, Column, Suggestions; we only need 1 & 5
results = utils::aspell(.BY$file)
unlink(.BY$file)
typo_idx <- sapply(results$Original, grep, msgid)
# take the first suggestion
replacement = sapply(
seq_along(results$Suggestions),
function(typo_i) {
# take the identified typo & replace it with aspell's 1st suggestion in the original `call`
gsub(
results$Original[typo_i], results$Suggestions[[typo_i]][1L],
call[typo_idx[typo_i]], fixed = TRUE
)
}
)
.(
call = call[typo_idx],
file = file[typo_idx],
line_number = line_number[typo_idx],
replacement = replacement
)
} else {
# unlist() to write both the n=1 and n!=1 messages to the file side-by-side
all_msgid <- unlist(msgid_plural)
cat(all_msgid, file = .BY$file, sep = "\n")
results = utils::aspell(.BY$file)
unlink(.BY$file)
# odd numbers in grep output --> first entry for each plural_msgid; even numbers --> second entry.
# do this arithmetic trick to re-map that to the original entry number in msgid_plural
typo_idx <- ((sapply(results$Original, grep, all_msgid) - 1L) %/% 2L) + 1L
# potentially overwrite each call >1 time if both messages have a typo
replacement = call
for (typo_i in seq_along(results$Suggestions)) {
replacement[typo_idx[typo_i]] <- gsub(
results$Original[typo_i], results$Suggestions[[typo_i]][1L],
replacement[typo_idx[typo_i]], fixed = TRUE
)
}
typo_idx <- unique(typo_idx)
.(
call = call[typo_idx],
file = file[typo_idx],
line_number = line_number[typo_idx],
replacement = replacement[typo_idx]
)
}
}]
}
In a package, we would probably use a few more helper functions to clean up & simplify the body of this diagnostic; we’re piling everything in sequence for illustration to have everything in one place.
We can check how the diagnostic works on a simple test package GreatSpelling
created for this vignette.
## Getting R-level messages...
## message_source type file msgid
## 1: R singular hazel.R These dark arts are forbiddden!
## 2: R singular spellman.R This is byond my abilities!
## 3: R plural merlin.R <NA>
## msgid_plural call
## 1: stop("These dark arts are forbiddden!")
## 2: warning("This is byond my abilities!")
## 3: %d lyfe left,%d lyves left ngettext(n, "%d lyfe left", "%d lyves left")
## line_number is_repeat is_marked_for_translation is_templated
## 1: 2 FALSE TRUE FALSE
## 2: 2 FALSE TRUE FALSE
## 3: 2 FALSE TRUE FALSE
## file type call file
## 1: hazel.R singular stop("These dark arts are forbiddden!") hazel.R
## 2: spellman.R singular warning("This is byond my abilities!") spellman.R
## 3: merlin.R plural ngettext(n, "%d lyfe left", "%d lyves left") merlin.R
## line_number replacement
## 1: 2 stop("These dark arts are forbidden!")
## 2: 2 warning("This is beyond my abilities!")
## 3: 2 ngettext(n, "%d lye left", "%d Yves left")
That should covers the basics – I look forward to seeing all the great uses you more creative developers can devise. Thanks for reading!