Hash tables are among the most useful data structures for efficient coding. An hash table can be abstractly thought as a map from a set of unique keys (which may be strings, numbers, or even more complicated objects) to a set of values. From a computational viewpoint, its most distinctive feature is that its read/write operations (i.e. storing, retrieving or deleting a particular key or key-value pair) have average \(O(1)\) time complexity, independent of the table size.
Many programming languages come with their own native implementation
of hash tables (for instance std::unordered_map/set
s in
C++, or dict
s and set
s in Python); in base R,
the objects which come closest to hash tables are
environment
s. These, however, can be somewhat cumbersome to
handle from the user point of view, and only support string type keys.
The purpose of {r2r}
is to provide a more flexible
implementation of hash tables in R, building on top of base R
environment
s.
In particular, r2r
hash tables support:
This document provides a quick hands-on introduction to
r2r
hash tables.
We create an empty hash map with:
We can insert key-value pairs in m
in several different
ways:
m[["key"]] <- "value"
m[c(1, 2, 3)] <- c("a", "b", "c") # Vectorized over keys and values
m[[c(4, 5, 6)]] <- c("d", "e", "f") # Not vectorized
The following queries explain the differences between the
[[
and [
operator mentioned in the comments
above:
m[["key"]]
#> [1] "value"
m[c(1, 2, 3)]
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] "b"
#>
#> [[3]]
#> [1] "c"
m[[c(1, 2, 3)]]
#> NULL
m[c(4, 5, 6)]
#> [[1]]
#> NULL
#>
#> [[2]]
#> NULL
#>
#> [[3]]
#> NULL
m[[c(4, 5, 6)]]
#> [1] "d" "e" "f"
Single element insertions and queries can also be performed through
the generics insert()
and query()
In addition to hash maps, we can also create hash sets, which simply store keys:
There is no restriction on the type of object you can use as keys and values. For instance:
You can set default values for missing keys. For instance:
which is useful for creating a counter:
objects <- list(1, 1, "1", FALSE, "1", 1)
for (object in objects)
m[[object]] <- m[[object]] + 1
m[["1"]]
#> [1] 2
Alternatively, you may throw an exception upon querying a missing key:
hashmap
s and hashmap
s use by default
base::identical()
to compare keys. For instance:
This behavior can be changed by explicitly providing a key comparison function. For this to work correctly, one must also explicitly provide an hash function which produces the same hashes for equivalent keys. A simple way to do this is to apply a preprocessing function to keys, as illustrated by the following example.
We assume that keys are length one complex numbers, and consider two
keys equivalent when they have the same direction in the complex plane.
The direction of a complex vector can be found applying the R function
Arg()
, which is thus a sensible key preprocessing function.
We can instruct an hashmap to preprocess its keys in this way through
the constructor’s key_preproc_fn
argument:
Let us check that everything works as intended: