library(bumbl)
library(car)
#> Loading required package: carData
Bumblebee colony growth is characterized by an initial exponential growth period when workers are being produced, followed by a switch to production of reproductive individuals (gynes and drones). Reproductive individuals leave the nest, resulting in a decline in colony size. The initial growth rate, the rate or decline, and the time to switching to reproduction may vary in response to environmental conditions (Crone and Williams 2016).
The bumbl()
function fits a model, described below and
in Crone & Williams (2016), that finds
a growth rate (\(\lambda\)), a decline
rate (\(\delta\)) , and a switchpoint
(\(\tau\)). When supplied with a
dataset with multiple colonies, a model is fit separately for each
colony and these three parameters (as well as an estimated starting
colony size and maximum colony size) are returned. In this example,
we’ll use the built-in bombus
dataset. For more information
on this dataset see the help file with ?bombus
.
Before the switch point, growth (colony weight, \(W_t\)) is defined as:
\[ W_t = \lambda^tW_0 \] After the switchpoint (\(t > \tau\)), it’s defined as:
\[ W_t = \lambda^{\tau}W_{0}\delta^{t-\tau} \]
Where \(\delta\) is a rate of decline. Therefore:
\[ W_t = \begin{cases} \lambda^tW_0 & t\leq\tau \\ \lambda^{\tau}W_{0}\delta^{t-\tau} & t > \tau \end{cases} \]
After log-linearizing this model, it it looks like this:
For \(t \leq \tau\):
\[ \ln(W_t) = \ln(W_0) + t\ln(\lambda) \]
For \(t>\tau\)
\[ \ln(W_t) = \ln(W_0) + \tau\ln(\lambda) + (t-\tau)\ln(\delta) \]
bumbl()
works by treating this as a generalized linear
model with a log-link after creating a new variable, .post
which is 0 before \(\tau\) and \(t-\tau\) after \(\tau\). Then the value of \(\tau\) that maximizes likelihood is found
and used to fit a final model.
\[ ln(W_t) = \beta_0 + \beta_1t + \beta_2 \textrm{ .post} \]
To clarify, in the above equation:
head(bombus)
#> # A tibble: 6 × 10
#> site colony wild habitat date week mass d.mass floral_reso…¹ cum_f…²
#> <fct> <fct> <dbl> <fct> <date> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 PUT2 9 0.98 W 2003-04-03 0 1910. 0.1 27.8 27.8
#> 2 PUT2 9 0.98 W 2003-04-09 1 1940 30.6 27.8 55.5
#> 3 PUT2 9 0.98 W 2003-04-15 2 1938 28.6 27.8 83.3
#> 4 PUT2 9 0.98 W 2003-04-22 3 1976. 67.1 27.8 111.
#> 5 PUT2 9 0.98 W 2003-05-01 4 2010. 101. 7.96 119.
#> 6 PUT2 9 0.98 W 2003-05-07 5 2143 234. 7.96 127.
#> # … with abbreviated variable names ¹floral_resources, ²cum_floral
The bumbl()
function expects a tidy dataset with a
column for some measure of time, and a column for some measure of colony
size at minimum. A formula is required, which at minimum should have
colony size on the left-hand side, and time on the right-hand side.
<- bumbl(bombus, colonyID = colony, t = week, formula = d.mass ~ week)
out #> Warning: Search for optimal switchpoint did not converge for colonyID '32'. Omitting from results.
#> Warning: Search for optimal switchpoint did not converge for colonyID '71'. Omitting from results.
head(out)
#> # A tibble: 6 × 7
#> colony converged tau logN0 logLam decay logNmax
#> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 104 TRUE 6.42 3.44 0.380 -0.541 5.78
#> 2 14 TRUE 6.40 4.04 0.295 -0.458 5.83
#> 3 17 TRUE 6.36 3.39 0.407 -0.662 5.83
#> 4 20 TRUE 7.27 2.79 0.194 -0.345 4.14
#> 5 22 TRUE 6.92 2.65 0.280 -0.439 4.57
#> 6 24 TRUE 6.23 4.06 0.167 -0.391 5.06
For each colony, we get the maximum likelihood estimate for \(\tau\) (tau
), the estimated
initial colony weight (\(\ln({W_0})\),
logN0
) on a log scale, the average colony growth rate
(\(\ln(\lambda)\), logLam
)
on a log scale, the rate of decline after \(\tau\) (\(ln(\delta - \lambda)\),
decay
), and the estimated maximum weight of each colony, on
a log scale (logNmax
).
The supplied formula can also include covariates for colony growth.
The model coefficients for any additional covariates are included in the
output of bumbl()
. Here, I’ve included cumulative floral
resources as a covariate. Accounting for some covariates, such as time
of day, might give better estimates of the switchpoint or growth and
decay rates.
<- bumbl(bombus, colonyID = colony, t = week,
out2 formula = d.mass ~ week + cum_floral)
#> Warning: Search for optimal switchpoint did not converge for colonyID '20'. Omitting from results.
#> Warning: Search for optimal switchpoint did not converge for colonyID '67'. Omitting from results.
#> Warning: Search for optimal switchpoint did not converge for colonyID '86'. Omitting from results.
head(out2)
#> # A tibble: 6 × 8
#> colony converged tau logN0 logLam decay logNmax beta_cum_floral
#> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 104 TRUE 5.56 3.12 0.361 -0.633 5.73 0.0142
#> 2 14 TRUE 5.57 -4.63 -0.381 -0.534 5.78 0.154
#> 3 17 TRUE 8.47 3.99 0.567 -0.545 5.82 -0.0267
#> 4 22 TRUE 7.11 2.87 0.409 -0.491 4.59 -0.00960
#> 5 24 TRUE 6.40 4.04 0.185 -0.386 5.04 -0.00174
#> 6 26 TRUE 7.28 3.34 -0.158 -0.111 5.49 0.0223
You may also use count data, such as number of workers entering or
exiting a hive, with either a Poisson or negative binomial GLM by
supplying the family
argument.
As is the case with any GLM, some model diagnostics should be
performed before interpreting the results. One way to check that results
are sensible, is to plot them. The plot()
method for data
frames produced by bumbl()
(plot.bumbldf()
)
plots each colony’s observed and estimated growth trajectory as a
separate plot. If you only want to display certain results, you can
supply a vector of indices or colony ID’s to the colony
argument.
plot(out, colony = c("17", "104", "20", "24"))
#> Creating plots for 4 colonies...
Observed values for colony 20 don’t follow a neat growth and decline
shape, and the colony mass was consistently very low. Because of this,
we might not trust the value for tau
as representing a true
switch from workers to reproductives.
There is also an autoplot()
method that can be used if
you have the ggplot2
package installed. It returns a
ggplot
object which can be modified.
library(ggplot2)
<- autoplot(out, colony = c("17", "104", "20", "24")) p
+ theme_bw() p
If you’d rather get these data and produce your own plots, run
bumbl()
with augment = TRUE
to get fitted
values to plot.
Other model diagnostics (plots or statistics) can be obtained from
the GLMs fit to each colony. Running bumbl()
with
keep.model = TRUE
produces an output with a list-column
containing the GLMs.
<- bumbl(bombus, colonyID = colony, t = week, formula = d.mass ~ week, keep.model = TRUE)
out3 #> Warning: Search for optimal switchpoint did not converge for colonyID '32'. Omitting from results.
#> Warning: Search for optimal switchpoint did not converge for colonyID '71'. Omitting from results.
head(out3)
#> # A tibble: 6 × 8
#> colony model converged tau logN0 logLam decay logNmax
#> <chr> <list> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 104 <glm> TRUE 6.42 3.44 0.380 -0.541 5.78
#> 2 14 <glm> TRUE 6.40 4.04 0.295 -0.458 5.83
#> 3 17 <glm> TRUE 6.36 3.39 0.407 -0.662 5.83
#> 4 20 <glm> TRUE 7.27 2.79 0.194 -0.345 4.14
#> 5 22 <glm> TRUE 6.92 2.65 0.280 -0.439 4.57
#> 6 24 <glm> TRUE 6.23 4.06 0.167 -0.391 5.06
Say, for example, you want to compute an R2 value using
the rsq
package.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:car':
#>
#> recode
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
#>
#> Attaching package: 'purrr'
#> The following object is masked from 'package:car':
#>
#> some
library(rsq)
#for a single colony
# m <- out3$model[[1]]
# rsq(m)
#for all colonies
.1 <-
out3%>%
out3 mutate(r2 = purrr::map_dbl(model, rsq), .after = model)
.1 %>% filter(colony %in% c("104", "17", "20", "24"))
out3#> # A tibble: 4 × 9
#> colony model r2 converged tau logN0 logLam decay logNmax
#> <chr> <list> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 104 <glm> 0.977 TRUE 6.42 3.44 0.380 -0.541 5.78
#> 2 17 <glm> 0.956 TRUE 6.36 3.39 0.407 -0.662 5.83
#> 3 20 <glm> 0.473 TRUE 7.27 2.79 0.194 -0.345 4.14
#> 4 24 <glm> 0.729 TRUE 6.23 4.06 0.167 -0.391 5.06
We can see that colony 20 has a much lower R2 value than colonies 104, 17, and 24 which matches our expectations from plotting the observed and fitted values.
See vignette("nest", package = "tidyr")
for more about
working with list-columns containing models.