README

The goal of CohortSymmetry is to carry out the necessary calculations for Sequence Symmetry Analysis (SSA). It is highly recommended that this method is tested beforehand against well-known positive and negative controls. Such controls could be found using Pratt et al (2015).

Installation

# install.packages("devtools")
devtools::install_github("OHDSI/CohortSymmetry")

Example

Create a reference to data in the OMOP CDM format

The CohortSymmetry package is designed to work with data in the OMOP CDM (Common Data Model) format, so our first step is to create a reference to the data using the CDMConnector package.

library(CDMConnector)
library(dplyr)
library(DBI)
library(duckdb)
 
db <- DBI::dbConnect(duckdb::duckdb(), 
                     dbdir = CDMConnector::eunomia_dir())
cdm <- cdm_from_con(
  con = db,
  cdm_schema = "main",
  write_schema = "main"
)

Step 0: Instantiate two cohorts in the cdm reference

This will be entirely user’s choice on how to generate such cohorts. Minimally, this package requires two cohort tables in the cdm reference, namely the index_cohort and the marker_cohort.

If one wants to generate two drugs cohorts in cdm, DrugUtilisation is recommended. For merely illustration purposes, we will carry out PSSA on aspirin (index_cohort) against amoxicillin (marker_cohort)

library(dplyr)
library(DrugUtilisation)
cdm <- DrugUtilisation::generateIngredientCohortSet(
  cdm = cdm, 
  name = "aspirin",
  ingredient = "aspirin")

cdm <- DrugUtilisation::generateIngredientCohortSet(
  cdm = cdm,
  name = "amoxicillin",
  ingredient = "amoxicillin")

Step 1: generateSequenceCohortSet

In order to initiate the calculations, the two cohorts tables need to be intersected using generateSequenceCohortSet(). This process will output all the individuals who appeared on both tables according to a user-specified parameters. This includes timeGap, washoutWindow, indexMarkerGap and daysPriorObservation. Details on these parameters could be found on the vignette.

library(CohortSymmetry)
 
cdm <- generateSequenceCohortSet(
  cdm = cdm,
  indexTable = "aspirin",
  markerTable = "amoxicillin",
  name = "aspirin_amoxicillin"
)

cdm$aspirin_amoxicillin %>% 
  dplyr::glimpse()
#> Rows: ??
#> Columns: 6
#> Database: DuckDB v0.10.1 [xihangc@Windows 10 x64:R 4.3.1/C:\Users\xihangc\AppData\Local\Temp\RtmpqOJIm1\file521c7f3a49b3.duckdb]
#> $ cohort_definition_id <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ subject_id           <int> 65, 119, 185, 144, 235, 197, 310, 280, 316, 331, …
#> $ cohort_start_date    <date> 1968-07-29, 1967-05-28, 1947-04-07, 1978-10-30, …
#> $ cohort_end_date      <date> 1969-06-18, 1968-04-07, 1947-04-12, 1979-09-04, …
#> $ index_date           <date> 1969-06-18, 1967-05-28, 1947-04-07, 1978-10-30, …
#> $ marker_date          <date> 1968-07-29, 1968-04-07, 1947-04-12, 1979-09-04, …

Step 2: summariseSequenceRatios

To get the sequence ratios, we would need the output of the generateSequenceCohortSet() function to be fed into summariseSequenceRatios() The output of this process contains cSR(crude sequence ratio), aSR(adjusted sequence ratio) and confidence intervals.

res <- summariseSequenceRatios(cohort = cdm$aspirin_amoxicillin)
 
res %>% glimpse()
#> Rows: 10
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ cdm_name         <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name       <chr> "index_cohort_name &&& marker_cohort_name", "index_co…
#> $ group_level      <chr> "1191_aspirin &&& 723_amoxicillin", "1191_aspirin &&&…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "crude", "adjusted", "crude", "crude", "adjusted", "a…
#> $ variable_level   <chr> "sequence_ratio", "sequence_ratio", "sequence_ratio",…
#> $ estimate_name    <chr> "point_estimate", "point_estimate", "lower_CI", "uppe…
#> $ estimate_type    <chr> "numeric", "numeric", "numeric", "numeric", "numeric"…
#> $ estimate_value   <chr> "1.43589743589744", "1927.66462191247", "0.9573119756…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

Step 3: visualise the results

The user could then visualise their results using a wide array of provided tools.

gt_results <- tableSequenceRatios(result = res)

gt_results

Note that flextable is also an option, users may specify this by using the type argument.

One could also visualise the plot, for example, the following is the plot of the adjusted sequence ratio.

plotSequenceRatios(result = res,
                  onlyaSR = T,
                  colours = "black")

plotTemporalSymmetry(cdm = cdm, sequenceTable = "aspirin_amoxicillin")

CohortSymmetry