This is an example of running longdat_cont()
. Note that
the time variable (proxy of treatment) here should be continuous. If the
time variable is discrete, please apply longdat_disc()
instead.
The input data frame (called master table) should have the same
format as the example data “LongDat_cont_master_table”. If you have
metadata and feature (eg. microbiome, immunome) data stored in separate
tables, you can go to the section Preparing
the input data frame with make_master_table() below. The function
make_master_table()
helps you to create master table from
metadata and feature tables.
Now let’s have a look at the required format for the input master table. The example below is a dummy longitudinal data set with 2 time points (day 0 and 7). Here we want to see if the treatment has a significant effect on gut microbial abundance or not.
# Read in the data frame. LongDat_cont_master_table is already lazily loaded.
master <- LongDat_cont_master_table
master %>%
kableExtra::kbl() %>%
kableExtra::kable_paper(bootstrap_options = "responsive", font_size = 12) %>%
kableExtra::scroll_box(width = "700px", height = "200px")
Individual | Day | sex | age | DrugA | DrugB | BacteriumA | BacteriumB | BacteriumC |
---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | 61 | 0.0 | 10 | 11 | 4 | 26 |
1 | 7 | 0 | 61 | 0.0 | 10 | 13 | 2 | 22 |
2 | 0 | 0 | 66 | 0.0 | 640 | 344 | 0 | 6 |
2 | 7 | 0 | 66 | 0.0 | 320 | 3 | 0 | 670 |
3 | 0 | 0 | 63 | 7.5 | 100 | 55 | 0 | 10 |
3 | 7 | 0 | 63 | 7.5 | 0 | 5 | 0 | 111 |
4 | 0 | 0 | 47 | 0.0 | 300 | 60 | 0 | 7 |
4 | 7 | 0 | 47 | 0.0 | 200 | 4 | 0 | 100 |
5 | 0 | 1 | 51 | 0.0 | 160 | 100 | 20 | 5 |
5 | 7 | 1 | 51 | 0.0 | 130 | 3 | 64 | 200 |
6 | 0 | 1 | 53 | 10.0 | 0 | 32 | 138 | 4 |
6 | 7 | 1 | 53 | 10.0 | 0 | 2 | 0 | 54 |
7 | 0 | 0 | 50 | 0.0 | 40 | 22 | 105 | 180 |
7 | 7 | 0 | 50 | 0.0 | 20 | 27 | 158 | 49 |
8 | 0 | 1 | 54 | 0.0 | 100 | 24 | 0 | 0 |
8 | 7 | 1 | 54 | 0.0 | 80 | 0 | 0 | 48 |
9 | 0 | 0 | 44 | 0.0 | 160 | 65 | 0 | 20 |
9 | 7 | 0 | 44 | 0.0 | 80 | 1 | 0 | 130 |
10 | 0 | 0 | 60 | 0.0 | 100 | 19 | 163 | 0 |
10 | 7 | 0 | 60 | 0.0 | 25 | 0 | 41 | 38 |
As you can see, the “Individual” is at the first column, and the features (dependent variables), which are gut microbial abundances in this case, are at the end of the table. Any column apart from individual, test_var (e.g. Day) and dependent variables will be taken as potential covariates (could be confounder or mediator). For example, here the potential covariates are sex, age, drug A and drug B. Please avoid using characters that don’t belong to ASCII printable characters for the column names in the input data frame.
If you have your input master table prepared already, you can skip
this section and go to Run
longdat_cont() directly. If your metadata and feature (eg.
microbiome, immunome) data are stored in two tables, you can create a
master table out of them easily with the function
make_master_table()
.
First, let’s take a look at an example of the metadata table. Metadata table should be a data frame whose columns consist of sample identifiers (sample_ID, unique for each sample), individual, time point and other meta data. Each row corresponds to one sample_ID.
# Read in the data frame. LongDat_cont_metadata_table is already lazily loaded.
metadata <- LongDat_cont_metadata_table
metadata %>%
kableExtra::kbl() %>%
kableExtra::kable_paper(bootstrap_options = "responsive", font_size = 12) %>%
kableExtra::scroll_box(width = "700px", height = "200px")
Sample_ID | Individual | Day | sex | age | DrugA | DrugB |
---|---|---|---|---|---|---|
1_0 | 1 | 0 | 0 | 61 | 0.0 | 10 |
1_7 | 1 | 7 | 0 | 61 | 0.0 | 10 |
2_0 | 2 | 0 | 0 | 66 | 0.0 | 640 |
2_7 | 2 | 7 | 0 | 66 | 0.0 | 320 |
3_0 | 3 | 0 | 0 | 63 | 7.5 | 100 |
3_7 | 3 | 7 | 0 | 63 | 7.5 | 0 |
4_0 | 4 | 0 | 0 | 47 | 0.0 | 300 |
4_7 | 4 | 7 | 0 | 47 | 0.0 | 200 |
5_0 | 5 | 0 | 1 | 51 | 0.0 | 160 |
5_7 | 5 | 7 | 1 | 51 | 0.0 | 130 |
6_0 | 6 | 0 | 1 | 53 | 10.0 | 0 |
6_7 | 6 | 7 | 1 | 53 | 10.0 | 0 |
7_0 | 7 | 0 | 0 | 50 | 0.0 | 40 |
7_7 | 7 | 7 | 0 | 50 | 0.0 | 20 |
8_0 | 8 | 0 | 1 | 54 | 0.0 | 100 |
8_7 | 8 | 7 | 1 | 54 | 0.0 | 80 |
9_0 | 9 | 0 | 0 | 44 | 0.0 | 160 |
9_7 | 9 | 7 | 0 | 44 | 0.0 | 80 |
10_0 | 10 | 0 | 0 | 60 | 0.0 | 100 |
10_7 | 10 | 7 | 0 | 60 | 0.0 | 25 |
This example is a dummy longitudinal meatadata with 2 time points for each individual. Besides sample_ID, individual, day columns, there are also information of sex, age and drugs that individuals take. Here we want to see if the treatment has a significant effect on gut microbial abundance or not.
Then, let’s see how a feature table looks like. Feature table should be a data frame whose columns only consist of sample identifiers (sample_ID) and features (dependent variables, e.g. microbiome). Each row corresponds to one sample_ID. Please do not include any columns other than sample_ID and features in the feature table.
# Read in the data frame. LongDat_cont_feature_table is already lazily loaded.
feature <- LongDat_cont_feature_table
feature %>%
kableExtra::kbl() %>%
kableExtra::kable_paper(bootstrap_options = "responsive", font_size = 12) %>%
kableExtra::scroll_box(width = "700px", height = "200px")
Sample_ID | BacteriumA | BacteriumB | BacteriumC |
---|---|---|---|
1_0 | 11 | 4 | 26 |
1_7 | 13 | 2 | 22 |
2_0 | 344 | 0 | 6 |
2_7 | 3 | 0 | 670 |
3_0 | 55 | 0 | 10 |
3_7 | 5 | 0 | 111 |
4_0 | 60 | 0 | 7 |
4_7 | 4 | 0 | 100 |
5_0 | 100 | 20 | 5 |
5_7 | 3 | 64 | 200 |
6_0 | 32 | 138 | 4 |
6_7 | 2 | 0 | 54 |
7_0 | 22 | 105 | 180 |
7_7 | 27 | 158 | 49 |
8_0 | 24 | 0 | 0 |
8_7 | 0 | 0 | 48 |
9_0 | 65 | 0 | 20 |
9_7 | 1 | 0 | 130 |
10_0 | 19 | 163 | 0 |
10_7 | 0 | 41 | 38 |
This example is a dummy longitudinal feature data. It stores the gut microbial abundance of each sample.
To enable the joining process of metadata and feature tables, please pay attention to the following rules.
Now let’s create a master table and take a look at the result!
master_created <- make_master_table(metadata_table = LongDat_cont_metadata_table,
feature_table = LongDat_cont_feature_table, sample_ID = "Sample_ID", individual = "Individual")
#> [1] "Finished creating master table successfully!"
master_created %>%
kableExtra::kbl() %>%
kableExtra::kable_paper(bootstrap_options = "responsive", font_size = 12) %>%
kableExtra::scroll_box(width = "700px", height = "200px")
Individual | Day | sex | age | DrugA | DrugB | BacteriumA | BacteriumB | BacteriumC |
---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | 61 | 0.0 | 10 | 11 | 4 | 26 |
1 | 7 | 0 | 61 | 0.0 | 10 | 13 | 2 | 22 |
2 | 0 | 0 | 66 | 0.0 | 640 | 344 | 0 | 6 |
2 | 7 | 0 | 66 | 0.0 | 320 | 3 | 0 | 670 |
3 | 0 | 0 | 63 | 7.5 | 100 | 55 | 0 | 10 |
3 | 7 | 0 | 63 | 7.5 | 0 | 5 | 0 | 111 |
4 | 0 | 0 | 47 | 0.0 | 300 | 60 | 0 | 7 |
4 | 7 | 0 | 47 | 0.0 | 200 | 4 | 0 | 100 |
5 | 0 | 1 | 51 | 0.0 | 160 | 100 | 20 | 5 |
5 | 7 | 1 | 51 | 0.0 | 130 | 3 | 64 | 200 |
6 | 0 | 1 | 53 | 10.0 | 0 | 32 | 138 | 4 |
6 | 7 | 1 | 53 | 10.0 | 0 | 2 | 0 | 54 |
7 | 0 | 0 | 50 | 0.0 | 40 | 22 | 105 | 180 |
7 | 7 | 0 | 50 | 0.0 | 20 | 27 | 158 | 49 |
8 | 0 | 1 | 54 | 0.0 | 100 | 24 | 0 | 0 |
8 | 7 | 1 | 54 | 0.0 | 80 | 0 | 0 | 48 |
9 | 0 | 0 | 44 | 0.0 | 160 | 65 | 0 | 20 |
9 | 7 | 0 | 44 | 0.0 | 80 | 1 | 0 | 130 |
10 | 0 | 0 | 60 | 0.0 | 100 | 19 | 163 | 0 |
10 | 7 | 0 | 60 | 0.0 | 25 | 0 | 41 | 38 |
The table “master_created” is just the same as the table “master” or
“LongDat_cont_master_table” in the previous section, with the
“Individual” as the first column, and the features (dependent
variables), which are gut microbial abundances in this case, are at the
end of the table. Any column apart from individual, test_var (e.g. Day)
and dependent variables will be taken as potential covariates (could be
confounder or mediator). For the details of the arguments, please read
the help page of this function by using
?make_master_table
.
OK, now we’re ready to run longdat_cont()
!
The input is the example data frame LongDat_cont_master_table (same
as “master” or “master_created” in the previous sections), and the
data_type is “count” since the dependent variables (features, in this
case they’re gut microbial abundance) are count data. The “test_var” is
the independent variable you’re testing, and here we’re testing “Day”
(time as the proxy for treatment). The variable_col is 7 because the
dependent variables start at column 7. And the fac_var mark the columns
that aren’t numerical. For the details of the arguments, please read the
help page of this function by using ?longdat_cont
.
The run below takes less than a minute to complete. When data_type equals to “count”, please remember to set seed (as shown below) so that you’ll get reproducible randomized control test.
# Run longdat_cont() on LongDat_cont_master_table
set.seed(100)
test_cont <- longdat_cont(input = LongDat_cont_master_table, data_type = "count",
test_var = "Day", variable_col = 7, fac_var = c(1, 3))
#> [1] "Start data preprocessing."
#> [1] "Finish data preprocessing."
#> [1] "Start selecting potential covariates."
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] "Finished selecting potential covariates."
#> [1] "Start null model test."
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] "Finish null model test."
#> [1] "Start covariate model test."
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] "Finish covariate model test."
#> [1] "Start unlisting tables from covariate model result."
#> [1] "Finish unlisting tables from covariate model result."
#> [1] "Finished post-hoc correlation test."
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] "Finished post-hoc correlation test."
#> [1] "Start randomized negative control model test."
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] "Finish randomized negative control model test."
#> [1] "Start removing the dependent variables to be exlcuded."
#> [1] "Finish removing the dependent variables to be exlcuded."
#> [1] "Start generating result tables."
#> [1] "Finished successfully!"
If you have completed running the function successfully, you’ll see the message “Finished successfully!” at the end. The results are stored in list format.
The major output from longdat_cont()
include a result
table and a covariate table. If you have count data (data_type equals to
“count”), then there are chances that you get a third table “randomized
control table”. If you specify data_type as either “measurement” or
“others”, then you’ll get a “Normalize_method” table. For more details
about the “randomized control table” and “Normalize_method” table,
please read the help page of this function by using
?longdat_cont
.
Let’s have a look at the result table first.
# The first dataframe in the list is the result table
result_table <- test_cont[[1]]
result_table %>%
kableExtra::kbl() %>%
kableExtra::kable_paper(bootstrap_options = "responsive", font_size = 12, position = "center") %>%
kableExtra::scroll_box(width = "700px")
Feature | Prevalence_percentage | Mean_abundance | Signal | Effect | EffectSize | Null_time_model_q | Post-hoc_q |
---|---|---|---|---|---|---|---|
BacteriumA | 90 | 39.50 | OK_nc | Decreased | -0.7809864 | 0.0000011 | 0.0000482 |
BacteriumB | 45 | 34.75 | NS | NS | -0.1328821 | 0.4496104 | 0.5765105 |
BacteriumC | 90 | 84.00 | OK_nc | Enriched | 0.7112976 | 0.0012725 | 0.0004375 |
The second and third columns show the prevalence and mean abundance of each feature According to the “Signal” column, treatment is a significant predictor for BacteriumA and BacteriumC as they show “OK_nc” (which represents OK and no covariate), meaning that the abundance of BacteriumA and BacteriumC alter significantly through time (proxy of treatment), and that there is no potential covariate. If there is covariate effect in the result, please see the covariate table to find out what the covariates are. As for BacteriumB, time (proxy of treatment) has no effect on its abundance.
The following column “Effect” describes the trend of dependent variables change along time. Here we can tell that BacteriumA and BacteriumC have decreasing and increasing patterns, respectively. From the next column “EffectSize”, we know that the effect sizes are -0.78 and 0.71, respectively. The important and the most relevant information for users ends here, which are listed from the first column to “EffectSize”.
Then the following columns contain the details of model test p values
(“Null_time_model_q”), the post-hoc test p values (Post.hoc_q). For more
detailed information of the columns in the result table, please refer to
the help page by using ?longdat_cont
.
The explanation of each type of “Signal” is listed below.
Signal | Meaning | Explanation |
---|---|---|
NS | Non-significant | There’s no effect of time. |
OK_nc | OK and no covariate | There’s an effect of time and there’s no potential covariate. |
OK_d | OK but doubtful | There’s an effect of time and there’s no potential covariate, however the confidence interval of the test_var estimate in the model test includes zero, and thus it is doubtful. Please check the raw data (e.g., plot feature against time) to confirm if there is real effect of time. |
OK_nrc | OK and not reducible to covariate | There are potential covariates, however there’s an effect of time and it is independent of those of covariates. |
EC | Entangled with covariate | There are potential covariates, and it isn’t possible to conclude whether the effect is resulted from time or covariates. |
RC | Effect reducible to covariate | There’s an effect of time, but it can be reduced to the covariate effects. |
Next, let’s take a look at the covariate table.
# The second dataframe in the list is the covariate table
covariate_table <- test_cont[[2]]
covariate_table %>%
kableExtra::kbl() %>%
kableExtra::kable_paper(bootstrap_options = "responsive", font_size = 12, position = "center") %>%
kableExtra::scroll_box(width = "700px")
Feature | Covariate1 | Covariate_type1 | Effect_size1 |
---|---|---|---|
The columns of this covariate table are grouped every three columns.
“Covariate1” is the name of the covariate, while “Covariate_type1” is
the covariate type of covariate1, that is, if the effect of time is
reducible to covariate1. “Effect_size1” is the effect size of the
dependent variable values between different levels of covariate1. If
there are more than one covariates, they will be listed along the rows
of each dependent variable. Since there is no covariate effect found in
this example (according to the result table), the covariate table is
blank. If you’d like to see a result with covariates, please read the
vignette of longdat_disc()
.
From the result above, we see that the treatment induces significant changes on the abundance of BacteriumA and BacteriumC, while causing no alteration in that of BacteriumB.
Finally, we can plot the result with the function
cuneiform_plot()
. The required input is a result table from
longdat_cont()
(or any table with the same format as a
result table does).
test_plot <- cuneiform_plot(result_table = test_cont[[1]], title_size = 15)
#> [1] "Finished plotting successfully!"
Here we can see the result clearly from the cuneiform plot. It shows
the features whose signals are not “NS”. The left panel displays the
effects in each time interval. Red represents positive effect size while
blue describes negative one (colors can be customized by users).
Signficant signals are indicated by solid shapes, whereas insignificant
signals are denoted by transparent ones. The right panel displays the
covariate status of each feature, and users can remove it by specifying
covariate_panel = FALSE
. For more details of the arguments,
please read the help page of this function by using
?cuneiform_plot
.
This tutorial ends here! If you have any further questions and can’t find the answers in the vignettes or help pages, please contact the author (Chia-Yu.Chen@mdc-berlin.de).