afex_plot()
visualizes results from factorial
experiments combining estimated marginal means and uncertainties
associated with the estimated means in the foreground with a depiction
of the raw data in the background. Currently, afex_plot()
supports the following models:
aov_car()
, aov_ez()
,
or aov_4()
(i.e., objects of class
"afex_aov"
)mixed()
(i.e.,
objects of class "mixed"
)lme4::lmer
(i.e.,
objects of class "merMod"
)emmeans
support. For some examples see
vignette("afex_plot_supported_models", package = "afex")
This document provides an overview of the plots possible with
afex_plot()
. It does so mostly using the
afex_plot()
examples, see ?afex_plot
. We begin
by loading afex
and ggplot2
which is
the package afex_plot()
uses for plotting. Loading
ggplot2
explicitly is not strictly necessary, but makes the
following code nicer. Otherwise, we would need to prepend each call to a
function from ggplot2
needed for customization with
ggplot2::
(as is done in the examples in
?afex_plot
).
We also load the cowplot
package (introduction)
which makes combining plots (with functions plot_grid()
and
legend()
) very easy. However, loading cowplot
sets a different theme for ggplot2
plots than the default
grey one. Although I am not a big fan of the default theme with its grey
background, we reset the theme globally using
theme_set(theme_grey())
to start with the default behavior
if cowplot
is not attached. Note that cowplot
also has the cool draw_plot()
function which allows
embedding plots within other plots.
We furthermore will need the following packages, however, we will not
attach them directly, but only call a few selected functions using the
package::function
notation.
jtools
for theme_apa()
ggpubr
for theme_pubr()
ggbeeswarm
for producing bee swarm plots with geom_beeswarm
ggpol
for producing combined box plots and jitter plots using
geom_boxjitter
We begin with a two-way within-subjects ANOVA using synthetic data
from Maxwell and Delaney (2004, p. 547). The data are hypothetical
reaction times from a 2 x 3 Perceptual Experiment with factors
angle
with 3 levels and factor noise
with 2
levels (see ?md_12.1
for a longer description). We first
load the data and then fit the corresponding ANOVA.
## Anova Table (Type 3 tests)
##
## Response: rt
## Effect df MSE F ges p.value
## 1 angle 1.92, 17.31 3702.02 40.72 *** .390 <.001
## 2 noise 1, 9 8460.00 33.77 *** .387 <.001
## 3 angle:noise 1.81, 16.27 1283.22 45.31 *** .188 <.001
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
##
## Sphericity correction method: GG
The ANOVA shows that both, the two main effect as well as the
interaction, are significant. We therefore inspect the pattern
underlying the interaction. There exist two different ways of plotting a
2-way interaction. Either of the two variables can be depicted on the
x-axis. And before having looked at both cases, it is often not clear
which visualization of the interaction is more instructive.
Consequently, we plot both next to each other. For this we simply need
to exchange which variable is the x
factor and which is the
trace
factor. We then use plot_grid()
to plot
them next to each other.
## Warning: Panel(s) show within-subjects factors, but not within-subjects error bars.
## For within-subjects error bars use: error = "within"
## Warning: Panel(s) show within-subjects factors, but not within-subjects error bars.
## For within-subjects error bars use: error = "within"
Before we can even take a look at the plot, we notice that creating
the plots has produced two warnings. These warnings complain that the
plots depict within-subject factors, but do not use within-subject error
bars. However, the warnings also tell us the solution (i.e., adding
error = "within"
), which we will do in the following. The
help page ?afex_plot
contains more information on which
type of error bars are appropriate in which situation and how to
interpret different type of error bars. For ANOVAs,
afex_plot()
will emit warnings if it thinks the error bars
are not appropriate for the chosen factors.
Comparing both plots, my impression is that the plot with
angle
on the x
-axis tells the clearer story.
We can see that when noise
is absent
there is
hardly any effect of the increase of angle
. However, if
noise
is present
an increasing
angle
clearly leads to increased RTs. We therefore use this
plot in the following.
We now produce a new variant of the left plot using more appropriate
error bars and change several other graphical details which make the
plot publication ready. We use the factor_levels
argument
to afex_plot()
for renaming the factor levels (for
technical reasons the ANOVA functions in afex
transform all
factor levels to proper R
variable names using
make.names()
which changed the labels from e.g.,
4
to X4
) and the legend_title
argument for changing the title of the legend. We also change the labels
on the x
and y
axis.
p_an <- afex_plot(aw, x = "angle", trace = "noise", error = "within",
factor_levels = list(angle = c("0°", "4°", "8°"),
noise = c("Absent", "Present")),
legend_title = "Noise") +
labs(y = "RTs (in ms)", x = "Angle (in degrees)")
## Renaming/reordering factor levels of 'angle':
## X0 -> 0°
## X4 -> 4°
## X8 -> 8°
## Renaming/reordering factor levels of 'noise':
## absent -> Absent
## present -> Present
As the additional output shows, changing the factor levels via
factor_levels
emits a message
detailing old
and new factor levels in the form old -> new
. This
message can be suppressed by wrapping the afex_plot()
call
into a suppressMessages()
call or via
RMarkdown
settings. Note that we could have also used the
factor_levels
argument for changing the order of the factor
levels by passing a named character vector (e.g.,
factor_levels = list(angle = c(X8 = "8°", X4 = "4°", X0 = "0°"))
).
This would change the order either on the x-axis or in the legend.
As said above, I am not a big fan of the default grey theme of
ggplot2
plots. Consequently, we compare a number of
different themes for this plot in the following. For all but
ggpubr::theme_pubr()
, we also move the legend to the bottom
as this better allows the plot to cover only a single column in a
two-column layout. ggpubr::theme_pubr()
automatically plots
the legend on top.
plot_grid(
p_an + theme_bw() + theme(legend.position="bottom"),
p_an + theme_light() + theme(legend.position="bottom"),
p_an + theme_minimal() + theme(legend.position="bottom"),
p_an + jtools::theme_nice() + theme(legend.position="bottom"),
p_an + ggpubr::theme_pubr(),
p_an + theme_cowplot() + theme(legend.position="bottom"),
labels = "AUTO"
)
The first row, panels A to C, shows themes coming with
ggplot2
and the second row, panels D to F, shows themes
from additional packages. In my opinion all of these plots are an
improvement above the default grey theme. For the themes coming with
ggplot2
, I really like that those shown here have a
reference grid in the background. This often makes it easier to judge
the actual values the shown data points have. I know that many people
find this distracting, so many of the contributed themes do not have
this grid. One thing I really like about the last two themes is that
they per default use larger font sizes for the axes labels. One way to
achieve something similar for most themes is to change
base_size
(see example below).
One general criticism I have with the current plots is that they show too many values on the y-axis. In the following I plot one more variant of this plot in which we change this to three values on the y-axis. We also increase the axes labels and remove the vertical grid lines.
p_an +
scale_y_continuous(breaks=seq(400, 900, length.out = 3)) +
theme_bw(base_size = 15) +
theme(legend.position="bottom",
panel.grid.major.x = element_blank())
We can also set this theme for the remainder of the R
session with theme_set()
.
To get our plot into a publication, we need to export it as a
graphics file. I would generally advise against exporting plots via the
RStudio
interface as this is not reproducible. Instead I
would use some of the following functions which save the document in the
current working directory. Note that following Elsevier
guidelines, a single column figure should have a width of 9 cm (~ 3
inch) and a two column figure should have a width of 19 cm (~ 7.5
inch).
For Word or similar documents I would export the plot as a
png
(never jpg
):
ggsave("my_plot.png", device = "png",
width = 9, height = 8, units = "cm",
dpi = 600) ## the larger the dpi, the better the resolution
For LaTeX
I would export as pdf
:
afex_plot()
per default plots the raw data in the
background. It does so using an alpha
blending of 0.5
. Thus, overlapping points appear
darker. Examples of this can be seen in the previous graphs where some
data points in the background appear clearly darker than others. The
darker points indicate values for which several data points lie exactly
on top of each other.
afex_plot()
provides the possibility to change or alter
the graphical primitive, called geom
in
ggplot2
parlance, used for plotting the points in the
background. This offers a vast array of options for handling overlapping
points or, more generally, how to display the raw data in the
background. Let’s take a look at some of these examples in the
following.
The simplest way for showing the data in the background is using a single geom, like in the default setting.
The following figure shows eight different variants of a single geom.
The plot in the top left shows the default variant. The first
alternative variant adds vertical jitter to the points to make the
overplotting clearer. The other alternatives use other geoms. Note that
depending on the specific variant we change a few further plot options
to obtain a visually pleasing result. For example, the
dodge
argument controls the spread of points belonging to
different levels of the trace
factor at each x-axis
position.
geom_count
. For this geom, adding a call to
scale_size_area()
can sometimes be beneficial.geom_violin
geom_boxplot
. Note that for this plot we have
added linetype = 1
to data_arg
, which avoids
that the outline of the box plots is affected by the
linetype
mapping (compare this with the violin plot where
the outline of the violin differs across levels of the angle
factor).ggbeeswarm::geom_beeswarm
ggbeeswarm::geom_quasirandom
ggpol::geom_boxjitter
p0 <- afex_plot(aw, x = "noise", trace = "angle", error = "within")
p1 <- afex_plot(aw, x = "noise", trace = "angle", error = "within", dodge = 0.3,
data_arg = list(
position =
ggplot2::position_jitterdodge(
jitter.width = 0,
jitter.height = 25,
dodge.width = 0.3 ## needs to be same as dodge
)))
p2 <- afex_plot(aw, x = "noise", trace = "angle", error = "within", dodge = 0.5,
data_geom = geom_count)
p3 <- afex_plot(aw, x = "noise", trace = "angle", error = "within",
data_geom = geom_violin,
data_arg = list(width = 0.5))
p4 <- afex_plot(aw, x = "noise", trace = "angle", error = "within",
data_geom = geom_boxplot,
data_arg = list(width = 0.3, linetype = 1))
p5 <- afex_plot(aw, x = "noise", trace = "angle", error = "within", dodge = 0.5,
data_geom = ggbeeswarm::geom_beeswarm,
data_arg = list(
dodge.width = 0.5, ## needs to be same as dodge
cex = 0.8))
p6 <- afex_plot(aw, x = "noise", trace = "angle", error = "within", dodge = 0.5,
data_geom = ggbeeswarm::geom_quasirandom,
data_arg = list(
dodge.width = 0.5, ## needs to be same as dodge
cex = 0.8,
width = 0.05 ## choose small value so data points are not overlapping
))
p7 <- afex_plot(aw, x = "noise", trace = "angle", error = "within", dodge = 0.7,
data_geom = ggpol::geom_boxjitter,
data_arg = list(
width = 0.5,
jitter.params = list(width = 0, height = 10),
outlier.intersect = TRUE),
point_arg = list(size = 2.5),
error_arg = list(linewidth = 1.5, width = 0))
plot_grid(p0, p1, p2, p3, p4, p5, p6, p7, ncol = 2, labels = 1:8)
We can also use multiple geoms to plot the data in the background. To
do so, we need to pass a list of geoms to data_geom
. We can
then also set by-geom additional arguments by passing a list of
arguments to data_arg
.
For example, we can combine a violin plot, drawing the outline of the
shape of the distribution, with geom_quasirandom
, showing
each individual data point in the same shape.
So far, all plots were shown in black and white only. However, it is
easy to include color. We do so for some of the plots from above. To
achieve this, we have to change the value of the mapping
argument.
p2 <- afex_plot(aw, x = "noise", trace = "angle", error = "within", dodge = 0.5,
mapping = c("shape", "color"),
data_geom = ggbeeswarm::geom_beeswarm,
data_arg = list(
dodge.width = 0.5, ## needs to be same as dodge
cex = 0.8))
p3 <- afex_plot(aw, x = "noise", trace = "angle", error = "within",
mapping = c("linetype", "shape", "fill"),
data_geom = ggplot2::geom_violin,
data_arg = list(width = 0.5))
p4 <- afex_plot(aw, x = "noise", trace = "angle", error = "within",
mapping = c("shape", "fill"),
data_geom = ggplot2::geom_boxplot,
data_arg = list(width = 0.3))
p5 <- afex_plot(aw, x = "noise", trace = "angle", error = "within", dodge = 0.7,
mapping = c("shape", "fill"),
data_geom = ggpol::geom_boxjitter,
data_arg = list(
width = 0.5,
jitter.params = list(width = 0, height = 10),
outlier.intersect = TRUE),
point_arg = list(size = 2.5),
line_arg = list(linetype = 0),
error_arg = list(linewidth = 1.5, width = 0))
plot_grid(p2, p3, p4, p5, ncol = 2)
For graphical elements in the foreground, afex_plot()
first plots all graphical elements belonging to the same factor level
before plotting graphical elements belonging to different factor levels.
This provides a consistent graphical impression for each factor level
that is particularly relevant in case color is mapped.
In case we have overlapping lines and error bars or use thick lines,
we sometimes do not want that the error bars also receive different line
types. In this case, we can simply pass linetype = 1
to
error_arg
to overwrite the corresponding mapping. This is
shown in the right plot.
p1 <- afex_plot(aw, x = "noise", trace = "angle", mapping = c("color"),
error = "within",
point_arg = list(size = 5), line_arg = list(size = 2),
error_arg = list(linewidth = 2))
p2 <- afex_plot(aw, x = "noise", trace = "angle",
mapping = c("color", "shape", "linetype"),
error = "within",
point_arg = list(size = 5), line_arg = list(size = 2),
error_arg = list(linewidth = 2, width = 0, linetype = 1))
plot_grid(p1, p2, ncol = 2)
If afex_plot()
is called without a trace factor, a
one-way plot is created. We can customize this plot in very much the
same way. Per default a one-way plot contains a legend if
mapping
is not empty (i.e., ""
). We show this
legend for the left plot, but suppress it for the right one via
theme(legend.position="none")
.
po1 <- afex_plot(aw, x = "angle", mapping = "color", error = "within",
point_arg = list(size = 2.5),
error_arg = list(linewidth = 1.5, width = 0.05))
po2 <- afex_plot(aw, x = "angle", error = "within",
data_geom = ggpol::geom_boxjitter,
mapping = "fill", data_alpha = 0.7,
data_arg = list(
width = 0.6,
jitter.params = list(width = 0.05, height = 10),
outlier.intersect = TRUE
),
point_arg = list(size = 2.5),
error_arg = list(linewidth = 1.5, width = 0.05)) +
theme(legend.position="none")
plot_grid(po1, po2)
One-way plots can also be split across different panels by specifying
a panel
factor:
afex_plot(aw, x = "angle", panel = "noise", error = "within",
data_geom = ggpol::geom_boxjitter,
mapping = "fill", data_alpha = 0.7,
data_arg = list(
width = 0.6,
jitter.params = list(width = 0.05, height = 10),
outlier.intersect = TRUE
),
point_arg = list(size = 2.5),
error_arg = list(linewidth = 1.5, width = 0.05)) +
theme(legend.position="none")
Sometimes we still want to add a line connecting the estimated
marginal means. As afex_plot()
returns a
ggplot2
object, we can do this easily by adding a
geom_line
object to the call. As we want to add a line
through all of the shown points in the foreground, we need to add the
corresponding groups aesthetics to this call:
geom_line(aes(group = 1))
. We can add further arguments to
this call, as shown in the left panel below.
plot_grid(
po1 + geom_line(aes(group = 1), color = "darkgrey", size = 1.5),
po2 + geom_line(aes(group = 1))
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
To exemplify the support for linear mixed models, we will use the
data from Freeman and colleagues also discussed in the mixed
model vignette. These data are lexical decision and word naming
latencies for 300 words and 300 nonwords from 45 participants presented
in Freeman et al. (2010). The dependent variable we are interested in is
log
RTs.
We look at the same model also discussed in the vignette, with
factors task
(between participants, but within items),
stimulus
(within participants, but between items),
density
(within participants, but between items), and
frequency
(within participants, but between items), for a
total of almost 13,000 observations. We fit the model with
crossed-random effects for participants (id
) and
item
s using the final model, m9s
, as discussed
in the mixed model vignette.
data("fhch2010") # load
fhch <- droplevels(fhch2010[ fhch2010$correct,]) # remove errors
m9s <- mixed(log_rt ~ task*stimulus*density*frequency +
(stimulus+frequency||id)+
(task|item), fhch, method = "S",
control = lmerControl(optCtrl = list(maxfun = 1e6)),
expand_re = TRUE)
Note that going forward, we disable calculation of degrees of freedom
for emmeans
as this speeds up computation and/or avoids
messages we are currently not interested in.
The ANOVA table of the mixed model indicates that the three-way
interaction task:stimulus:frequency
is significant on which
we focus in the following.
## Mixed Model Anova Table (Type 3 tests, S-method)
##
## Model: log_rt ~ task * stimulus * density * frequency + (stimulus +
## Model: frequency || id) + (task | item)
## Data: fhch
## Effect df F p.value
## 1 task 1, 43.51 13.68 *** <.001
## 2 stimulus 1, 50.57 151.38 *** <.001
## 3 density 1, 584.58 0.36 .547
## 4 frequency 1, 70.27 0.56 .456
## 5 task:stimulus 1, 51.50 71.32 *** <.001
## 6 task:density 1, 578.72 17.89 *** <.001
## 7 stimulus:density 1, 584.60 1.19 .275
## 8 task:frequency 1, 74.09 82.77 *** <.001
## 9 stimulus:frequency 1, 584.78 63.29 *** <.001
## 10 density:frequency 1, 584.62 0.11 .742
## 11 task:stimulus:density 1, 578.74 14.87 *** <.001
## 12 task:stimulus:frequency 1, 578.91 124.16 *** <.001
## 13 task:density:frequency 1, 578.75 5.93 * .015
## 14 stimulus:density:frequency 1, 584.64 4.62 * .032
## 15 task:stimulus:density:frequency 1, 578.77 11.72 *** <.001
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
For mixed models, one important decision is the random-effects
grouping factor(s) based on which the raw data plotted in the background
is aggregated. This decision is necessary, because without such a
factor, there would only be one observation for each cell of the design
(unless the full design is considered). In the default setting, with
id
missing, the combination of all random-effects grouping
factors is used.
## Aggregating data over: item, id
## NOTE: Results may be misleading due to involvement in interactions
In the present case, a message informs us that the data is aggregated over both random-effects grouping factors. However, this leads to way too many data points in the background. Let us compare this plot with plots in which we use each of the two random-effects grouping factors in turn.
plot_grid(
afex_plot(m9s, x = "stimulus", trace = "frequency", panel = "task",
id = "id"),
afex_plot(m9s, x = "stimulus", trace = "frequency", panel = "task",
id = "item"),
labels = c("ID", "Item")
)
## NOTE: Results may be misleading due to involvement in interactions
## NOTE: Results may be misleading due to involvement in interactions
The by-id plot looks usable. However, the by-item plot has still way too many data-points to be informative. Some other ways of displaying the raw data, such as violin plots or box plots, seems preferable for it.
We compare violin plots or box plots for the by-item data in the next plot. For the box plot, we increase the width of the error bars and use a consistent line type to distinguish them more easily from the graphical elements of the box plot. We could probably further improve these plots by, for example, adding colors or using some of the other customizations discussed above for the ANOVA example.
plot_grid(
afex_plot(m9s, x = "stimulus", trace = "frequency", panel = "task",
id = "item", dodge = 0.8,
data_geom = geom_violin,
data_arg = list(width = 0.5)),
afex_plot(m9s, x = "stimulus", trace = "frequency", panel = "task",
id = "item", dodge = 0.8,
data_geom = geom_boxplot,
data_arg = list(width = 0.5),
error_arg = list(linewidth = 1.5, width = 0, linetype = 1))
)
## NOTE: Results may be misleading due to involvement in interactions
## NOTE: Results may be misleading due to involvement in interactions
The default error bars for afex_plot()
are based on the
statistical model (i.e., the mixed model in the present case). These
error bars can only be used to judge whether or not two means differ
from each other, if the corresponding factor (or factors) are
independent samples factors (i.e., not repeated-measures factors for any
of the random-effects grouping factors). Of course, in addition to this
the requirement of approximately equal sample size and variance also
needs to hold. In the present case, all of the factors are
repeated-measures factors with respect to one of the random-effects
grouping factors. Consequently, the default error bars cannot be used
for “inference by eye” for any of the factors.
This is also easy to see when looking at all pairwise comparisons
between means for each of the panels/tasks. This shows that for the
naming
task all comparisons are significant. In visual
contrast with that, the two error bars for the low
versus
high
word
s are overlapping very strongly.
## task = naming:
## contrast estimate SE df z.ratio p.value
## word low - nonword low -0.1804 0.0236 Inf -7.637 <.0001
## word low - word high 0.0588 0.0149 Inf 3.941 0.0005
## word low - nonword high -0.3874 0.0250 Inf -15.515 <.0001
## nonword low - word high 0.2392 0.0250 Inf 9.583 <.0001
## nonword low - nonword high -0.2070 0.0150 Inf -13.823 <.0001
## word high - nonword high -0.4462 0.0236 Inf -18.880 <.0001
##
## task = lexdec:
## contrast estimate SE df z.ratio p.value
## word low - nonword low -0.0812 0.0235 Inf -3.461 0.0030
## word low - word high 0.0635 0.0167 Inf 3.807 0.0008
## word low - nonword high 0.0297 0.0245 Inf 1.213 0.6188
## nonword low - word high 0.1447 0.0245 Inf 5.910 <.0001
## nonword low - nonword high 0.1109 0.0167 Inf 6.619 <.0001
## word high - nonword high -0.0339 0.0233 Inf -1.451 0.4676
##
## Results are averaged over the levels of: density
## Degrees-of-freedom method: asymptotic
## P value adjustment: tukey method for comparing a family of 4 estimates
An alternative in the present situation would be using
within-subjects error bars and aggregating the data by-id (i.e.,
error = "within"
), as done in the left panel below. This is
somewhat appropriate here as the factors within each panel are all
within-subject factors. In contrast, using by-item within-subjects error
bars, as done in the right panel below, seems not appropriate as the
only within-item factor, task
, is spread across panels.
Unfortunately, it is not immediately clear if these error bars allow one
to correctly detect which means do not differ from each other.
plot_grid(
afex_plot(m9s, x = "stimulus", trace = "frequency", panel = "task",
id = "id", error = "within"),
afex_plot(m9s, x = "stimulus", trace = "frequency", panel = "task",
id = "item", dodge = 0.8, error = "within",
data_geom = geom_violin,
data_arg = list(width = 0.5))
)
In sum, using error bars for performing “inference by eye” - that is, using overlap or non-overlap of error bars to judge which means differ or do not differ from each other - is highly problematic for mixed models, due to the potentially complex dependency structures between the means. It would be best to avoid comparisons between means altogether. Instead, it is perhaps a good idea to plot the model-based error bars (which is the default) and use them for their intended purpose; judging which values of the estimated means are likely given what we have learned from the model (however, note that one cannot interpret a 95% confidence interval as having a 95% probability of containing the population mean).
The help page ?afex_plot
contains further information
and references on how to interpret confidence intervals and other error
bars.