In this vignette we describe how the linkage scenario functionality can be used in reverse, to calculate the necessary sample size to achieve a prespecified false discovery rate. For example, this calculation would be useful if a researcher is conducting a study for which specimens have yet to be collected, but they want to ensure at least 75% confidence in identified links. To calculate the proportion of links that represent true transmission pairs, the user needs to provide the sensitivity and specificity of the linkage criteria used, the final size of the outbreak, and the desired minimum true discovery rate (\(1-\text{False Discovery Rate}\)):
Param | Variable Name | Description |
---|---|---|
\(\eta\) | sensitivity | the sensitivity of the linkage criteria for identifying transmission links |
\(\chi\) | specificity | the specificity of the linkage criteria for identifying transmission links |
\(N\) | N | the final size of the outbreak (total number of infections) |
\(R\) | R | the average reproductive number (also denoted \(R_\text{pop}\), see below) |
\(\phi\) | tdr | the desired minimum true discovery rate |
Imagine we are interested in collecting enough samples from an
outbreak of 100 cases to ensure that the identified links are correct at
least 75% of the time. The phylogenetic criteria we are using to
identify links has a sensitivity of 99% and a specificity of 99.5%, and
we assume multiple transmissions and multiple links are possible (i.e.,
we use the default assumption
argument,
mtml
).
library(phylosamp)
translink_samplesize(sensitivity=0.99, specificity=0.995,
N=100, R=1, tdr=0.75)
## [1] 10
Although 10 samples will ensure a false discovery rate of no more
than 25%, 10 samples may not provide enough data for analysis. In this
case, we can use the optional min_pairs
parameter to
require that the expected number of links (calculated using the
translink_expected_links_obs()
function) is at least some
minimum value:
translink_samplesize(sensitivity=0.99, specificity=0.995,
N=100, R=1, tdr=0.75, min_pairs=30)
## [1] 50
In another example, it may be crucial that links are identified with high certainty. So we increase the minimum true discovery rate to 95%:
translink_samplesize(sensitivity=0.99, specificity=0.995,
N=100, R=1, tdr=0.95)
## Error in translink_samplesize(sensitivity = 0.99, specificity = 0.995, : Input values do no produce a viable solution
This result suggests it is not possible to obtain such high certainty
with this linkage criteria. We can confirm this with the
translink_tdr
function, which shows that even with 100%
sampling of this outbreak, we will correctly identify transmission links
no more than 80% of the time.
translink_tdr(sensitivity=0.99, specificity=0.995, rho=1, M=100, R=1)
## Calculating true discovery rate assuming multiple-transmission and multiple-linkage
## [1] 0.8032454
Further exploration of the translink_samplesize()
function reveals that there are limited combinations of sensitivity and
specificity that produce reasonable sample sizes, and that specificity
in particular affects the minimum false discovery rate that can be
obtained. Therefore, correctly estimating sensitivity and specificity
are of key importance when using these functions to understand
transmission.