The package offers a suite of align_*
functions designed
to give you precise control over plot layout. These functions enable you
to reorder the observations or partition the observations into multiple
groups.
Currently, there are four key align_*
functions
available for layout customization:
align_group
: Group and align plots
based on categorical factors.align_order
: Reorder layout
observations based on statistical weights or allows for manual
reordering based on user-defined ordering index.align_kmeans
: Group observations by
k-means clustering results.align_dendro
: Align plots according to
hierarchical clustering or dendrograms.set.seed(123)
small_mat <- matrix(rnorm(81), nrow = 9)
rownames(small_mat) <- paste0("row", seq_len(nrow(small_mat)))
colnames(small_mat) <- paste0("column", seq_len(ncol(small_mat)))
align_group
The align_group()
function allows you to group
rows/columns into separate panels. It doesn’t add any plot area.
ggheatmap(small_mat) +
anno_top() +
align_group(sample(letters[1:4], ncol(small_mat), replace = TRUE))
#> → heatmap built with `geom_tile()`
By default, the facet strip text is removed. You can override this
behavior with theme(strip.text = element_text())
. Since
align_group()
does not create a new plot, the panel title
can only be added to the heatmap plot.
align_order
The align_order()
function order the rows/columns based
on the summary weights, Like align_group()
, it doesn’t add
a plot area.
Here, we order the rows based on the means.
In addition, we can provide the ordering integer index directly in
the order
argument.
my_order <- sample(nrow(small_mat))
print(rownames(small_mat)[my_order])
#> [1] "row3" "row1" "row7" "row6" "row2" "row8" "row9" "row5" "row4"
We can also provide the ordering character index.
ggheatmap(small_mat) +
anno_left() +
align_order(rownames(small_mat)[my_order])
#> → heatmap built with `geom_tile()`
By default, align_order()
reorders the rows or columns
in ascending order of the summary function’s output (from bottom to top
for rows, or from left to right for columns). To reverse this order, you
can set reverse = TRUE
:
ggheatmap(small_mat) +
anno_left() +
align_order(rowMeans, reverse = TRUE)
#> → heatmap built with `geom_tile()`
Some align_*
functions accept a data
argument. This can be a matrix, a data frame, or even a simple vector.
The data
argument can also accept a function (purrr-like
lambda syntax is supported), which will be applied to the layout data.
It is important to note that all align_*
functions consider
the rows
as the observations. It means the
NROW(data)
must return the same number with the
observations in axis used for alignment.
quad_layout()
/ggheatmap()
: for column
annotation, the layout
data will be transposed before using
(If data is a function
, it will be applied with the
transposed matrix). This is necessary because column annotation uses
heatmap columns as observations, but we need rows.
stack_layout()
/ggstack()
: the
layout
data will be used as it is since we place all plots
along a single axis.
Even for top and bottom annotations, you can use
rowMeans()
to calculate the mean value across all
columns.
align_kmeans
The align_kmeans()
function groups heatmap rows or
columns based on k-means clustering. Like the previous functions, it
does not add a plot area.
Note that all align_*
functions which define groups must
not break the previous established groups. This means the new groups
must nest in the old groups, in this way, usually they cannot be used if
groups already exist.
The align_dendro()
function adds a dendrogram to the
layout and can also reorder or split the layout based on hierarchical
clustering. This is particularly useful for working with heatmap
plots.
Hierarchical clustering is performed in two steps: calculate the
distance matrix and apply clustering. You can use the
distance
and method
argument to control the
dendrogram builind process.
There are two ways to specify distance
metric for
clustering:
distance
as a pre-defined option. The valid
values are the supported methods in dist()
function and
coorelation coefficient "pearson"
, "spearman"
and "kendall"
. The correlation distance is defined as
1 - cor(x, y, method = distance)
.ggheatmap(small_mat) +
anno_top() +
align_dendro(distance = "pearson") +
patch_titles(top = "pre-defined distance method (1 - pearson)")
#> → heatmap built with `geom_tile()`
ggheatmap(small_mat) +
anno_top() +
align_dendro(distance = function(m) dist(m)) +
patch_titles(top = "a function that calculates distance matrix")
#> → heatmap built with `geom_tile()`
Method to perform hierarchical clustering can be specified by
method
. Possible methods are those supported in
hclust()
function. And you can also provide a self-defined
function, which accepts the distance object and return a
hclust
object.
ggheatmap(small_mat) +
anno_top() +
align_dendro(method = "ward.D2")
#> → heatmap built with `geom_tile()`
The dendrogram can also be used to cut the columns/rows into groups.
You can specify k
or h
, which work similarly
to cutree()
:
In contrast to align_group()
,
align_kmeans()
, and align_order()
,
align_dendro()
is capable of drawing plot components. So it
has a default set_context
value of TRUE
,
meaning it will set the active context of the annotation stack layout.
In this way, we can add any ggplot elements to this plot area.
ggheatmap(small_mat) +
anno_top() +
align_dendro() +
geom_point(aes(y = y))
#> → heatmap built with `geom_tile()`
The align_dendro()
function creates default
node
data for the ggplot. See
ggplot2 specification
in ?align_dendro
for
details. Additionally, edge
data is added to the
ggplote::geom_segment()
layer directly, used to draw the
dendrogram tree. One useful variable in both node
and
edge
data is the branch
column, corresponding
to the cutree
result:
ggheatmap(small_mat) +
anno_top() +
align_dendro(aes(color = branch), k = 3) +
geom_point(aes(color = branch, y = y))
#> → heatmap built with `geom_tile()`
You can reorder the dendrogram based on the mean values of the
observations by setting reorder_dendrogram = TRUE
.
h1 <- ggheatmap(small_mat) +
anno_top() +
align_dendro(aes(color = branch), k = 3, reorder_dendrogram = TRUE) +
ggtitle("reorder_dendrogram = TRUE")
h2 <- ggheatmap(small_mat) +
anno_top() +
align_dendro(aes(color = branch), k = 3) +
ggtitle("reorder_dendrogram = FALSE")
align_plots(h1, h2)
#> → heatmap built with `geom_tile()`
#> → heatmap built with `geom_tile()`
align_dendro()
can also perform clustering between
groups, meaning it can be used even if there are existing groups present
in the layout, in this way, you cannot specify k
or
h
:
set.seed(3L)
column_groups <- sample(letters[1:3], ncol(small_mat), replace = TRUE)
ggheatmap(small_mat) +
anno_top() +
align_group(column_groups) +
align_dendro(aes(color = branch))
#> → heatmap built with `geom_tile()`
You can reorder the groups by setting
reorder_group = TRUE
.
ggheatmap(small_mat) +
anno_top() +
align_group(column_groups) +
align_dendro(aes(color = branch), reorder_group = TRUE)
#> → heatmap built with `geom_tile()`
You can merge the sub-tree in each group by settting
merge_dendrogram = TRUE
.
ggheatmap(small_mat) +
anno_top() +
align_group(column_groups) +
align_dendro(aes(color = branch), merge_dendrogram = TRUE)
#> → heatmap built with `geom_tile()`
You can reorder the dendrogram and merge simutaneously.
ggheatmap(small_mat) +
anno_top() +
align_group(column_groups) +
align_dendro(aes(color = branch),
reorder_group = TRUE,
merge_dendrogram = TRUE
) +
anno_bottom() +
align_dendro(aes(color = branch),
reorder_group = FALSE,
merge_dendrogram = TRUE
)
#> → heatmap built with `geom_tile()`
If you specify k
or h
, this will always
turn off sub-clustering. The same principle applies to
align_dendro()
, where new groups must be nested within the
previously established groups.
sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04 LTS
#>
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so; LAPACK version 3.8.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Asia/Shanghai
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggalign_0.0.5 ggplot2_3.5.1
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.5 jsonlite_1.8.8 dplyr_1.1.4 compiler_4.4.0
#> [5] highr_0.11 tidyselect_1.2.1 ggbeeswarm_0.7.2 jquerylib_0.1.4
#> [9] textshaping_0.4.0 systemfonts_1.1.0 scales_1.3.0 yaml_2.3.8
#> [13] fastmap_1.2.0 R6_2.5.1 labeling_0.4.3 generics_0.1.3
#> [17] knitr_1.47 tibble_3.2.1 bookdown_0.39 desc_1.4.3
#> [21] munsell_0.5.1 RColorBrewer_1.1-3 bslib_0.7.0 pillar_1.9.0
#> [25] rlang_1.1.4 utf8_1.2.4 cachem_1.1.0 xfun_0.45
#> [29] sass_0.4.9 viridisLite_0.4.2 cli_3.6.3 withr_3.0.0
#> [33] magrittr_2.0.3 digest_0.6.36 grid_4.4.0 beeswarm_0.4.0
#> [37] lifecycle_1.0.4 vipor_0.4.7 ggrastr_1.0.2 vctrs_0.6.5
#> [41] evaluate_0.24.0 glue_1.7.0 farver_2.1.2 ragg_1.3.2
#> [45] fansi_1.0.6 colorspace_2.1-0 rmarkdown_2.27 tools_4.4.0
#> [49] pkgconfig_2.0.3 htmltools_0.5.8.1