Simulating taxonomy

Joelle Barido-Sottani, Rachel C. M. Warnock

2024-10-02

This vignette provides information about how the package stores and outputs information about species taxonomy via the taxonomy object and available options for simulating species taxonomy.

Contents

The taxonomy object

A phylogenetic tree object, in the format used by most software packages, does not typically contain information about species over time. A timescaled phylogeny is informative about the minimum number of co-existing lineages at a given moment but not about the start and end point of species or the mode of speciation. Three modes of speciation that have been discussed widely in palaeobiological literature are shown in Fig. 1. and can be modelled using FossilSim:

**Fig. 1**. Three different modes of speciation that can be simulated using FossilSim. Budding and bifurcation are also referred to as asymmetric and symmetric speciation.

Fig. 1. Three different modes of speciation that can be simulated using FossilSim. Budding and bifurcation are also referred to as asymmetric and symmetric speciation.

The distinction between these processes is important because they have consequences on parameter estimates using different methods.

In molecular phylogenetics, we typically draw trees as fully bifurcating structures (i.e. each node gives rise to two descendant branches). This is also how trees are stored in computer memory by most phylogenetic software packages, including ape. However, it may not be the case that each internal edge represents a unique species. Instead, some speciation events may be budding rather than bifurcating or anagenesis may have occurred at some point along internal edges. To explore the impact of different speciation modes it is valuable to know the relationship between species through time (the taxonomy) and the corresponding tree object.

FossilSim contains options for simulating species evolution under different speciation modes, described in the section Simulating taxonomy. Information about the true species taxonomy is stored in the taxonomy object, which will be associated with the corresponding phylo object used to simulate species evolution.

The object contains a dataframe detailing species taxonomy, which contains the following 7 fields for each edge and species in the corresponding phylo object:

A new taxonomy object can be created by passing the above information to the taxonomy() function.

Cryptic species can also be simulated. Under cryptic speciation the descendant species will be indistinguishable from the ancestor and the taxonomy object can incorporate this information using the additional optional fields:

The relationship between information provided by the taxonomy object and the corresponding phylo object will become clear when you start simulating species and exploring the output.

The taxonomy object can be used as the starting point for simulating fossils or other downstream analyses that require information about discrete species units.

Special Note The birth and death rates used as parameters in the birth-death process typically do not correspond to the rates of appearance and extinction of species under the above processes, unless speciation occurs entirely via budding.

Simulating taxonomy

Species evolution or taxonomy can be simulated in FossilSim under a mixed model of speciation that can incorporate three modes of speciation – budding (or asymmetric), bifurcating (or symmetric) and anagenetic – in addition to cryptic speciation (Stadler et al. 2018). This model of mixed speciation was described previously in (Bapst 2013) and is also available as part of the paleotree package (Bapst 2012).

In a birth-death process \(\lambda\) is the rate at which branching events occur, while \(\mu\) is the rate at which lineage termination occurs. These are the parameters typically used to simulate birth-death trees (see the code snippet below for an example). The mixed model of speciation includes three additional parameters, \(\beta, \lambda_a\) and \(\kappa\), which are defined as follows:

The relationship between these processes and the underlying tree structure in the corresponding phylo object is described in section The taxonomy object. Simulating species evolution or taxonomy for any user specified tree in FossilSim is straightforward.

# set the random number generator seed to generate the same results using the same code
set.seed(123)

# simulate a tree using TreeSim conditioned on tip number
lambda = 1
mu = 0.2
tips = 8
t = TreeSim::sim.bd.taxa(n = tips, numbsim = 1, lambda = lambda, mu = mu)[[1]]
# t is an object of class phylo
t
#> 
#> Phylogenetic tree with 10 tips and 9 internal nodes.
#> 
#> Tip labels:
#>   t2, t5, t8, t4, t9, t10, ...
#> 
#> Rooted; includes branch lengths.
# use t$edge, t$edge.length, t$root.edge to see the tree attributes

# simulate under complete budding speciation 
s = sim.taxonomy(tree = t) # this is equivalent to using the default parameters beta = 0, lambda_a = 0, kappa = 0
# s is an object of class taxonomy
s
#>    sp edge parent      start        end mode cryptic cryptic.id
#> 1   1    1      8 0.38710924 0.00000000    b       0          1
#> 2   2    2      4 0.14461369 0.00000000    b       0          2
#> 3   3    3      6 0.08786013 0.00000000    b       0          3
#> 4   4   14     10 0.84708717 0.52482777    b       0          4
#> 5   4   17     10 0.52482777 0.14461369    b       0          4
#> 6   4    4     10 0.14461369 0.00000000    b       0          4
#> 7   5    5      8 0.10233497 0.00000000    b       0          5
#> 8   6   19      8 0.44503437 0.08786013    b       0          6
#> 9   6    6      8 0.08786013 0.00000000    b       0          6
#> 10  7    7      4 0.52482777 0.00000000    b       0          7
#> 11  8   15     10 1.53730579 0.44503437    b       0          8
#> 12  8   16     10 0.44503437 0.38710924    b       0          8
#> 13  8   18     10 0.38710924 0.10233497    b       0          8
#> 14  8    8     10 0.10233497 0.00000000    b       0          8
#> 15  9    9     10 1.17167856 0.09161934    b       0          9
#> 16 10   11      0 3.34383891 1.53730579    o       0         10
#> 17 10   12      0 1.53730579 1.17167856    o       0         10
#> 18 10   13      0 1.17167856 0.84708717    o       0         10
#> 19 10   10      0 0.84708717 0.34120366    o       0         10
#> Taxonomy representing 10 species across 19 edges.

Note that the way sim.taxonomy assigns budding species is deterministic – at each branching event, the “left” descendant edge will always be assigned the ancestral species label and the “right” descendant edge will always be the new species. This means even without setting the random number seed sim.taxonomy will always produce the same output when \(\beta = 0\).

FossilSim contains various options for plotting the package output. The plot.taxonomy function will produce a plot highlighting species taxonomy along with the corresponding bifurcating tree object.

Note that R detects what type of object you are trying to plot, in this case taxonomy, and will automatically call plot.taxonomy when you apply the function plot to an object of class taxonomy. Use ?plot.taxonomy to view further options that can be used to change the appearance of the figure.

plot(s, tree = t, legend.position = "topleft")

# simulate under complete bifurcating speciation
s = sim.taxonomy(tree = t, beta = 1)
plot(s, tree = t, legend.position = "topleft")

# simulate under mixed speciation
s = sim.taxonomy(tree = t, beta = 0.5, lambda.a = 1, kappa = 0.1)
plot(s, tree = t, legend.position = "topleft")

Note that plot.taxonomy only plots the true species, and ignores any information about cryptic speciation (i.e. the function uses the sp labels and ignores cryptic.id).

Given an existing taxonomy object you can also add anagenetic or cryptic species downstream using the functions sim.anagenetic.species and sim.cryptic.species. These functions also return a taxonomy object.

# simulate taxonomy without anagenetic or cryptic species
s1 = sim.taxonomy(tree = t, beta = 0.5)

# simulate anagenetic species 
# note this function also requires the corresponding tree object
s2 = sim.anagenetic.species(tree = t, species = s1, lambda.a = 1)

# simulate cryptic species
s3 = sim.cryptic.species(species = s2, kappa = 0.1)

These functions can not be used with taxonomy objects that already contain anagenetic or cryptic species.

See also

See the paleotree vignette to see how FossilSim objects can be converted into paleotree objects.

References

Bapst, D.W. (2012). Paleotree: An R package for paleontological and phylogenetic analyses of evolution. Methods in Ecology and Evolution, 3, 803–807.
Bapst, D.W. (2013). When can clades be potentially resolved with morphology? Plos One, 8, e62312.
Stadler, T., Gavryushkina, A., Warnock, R.C.M., Drummond, A.J. & Heath, T.A. (2018). The fossilized birth-death model for the analysis of stratigraphic range data under different speciation modes. Journal of Theoretical Biology, 447, 41–55.