Getting started with cancerR

Getting started

This vignette will show you how to use the cancerR package to classify cancer subtypes using the information available from pathology reports which are typically coded using the International Classification of Diseases for Oncology (ICD-O) system. This information is typically available in cancer registries and can be used to classify the type of cancer.

library(cancerR)

# Make example data

data <- data.frame(
  icd_o3_histology = c("8522", "9490", "9070"),
  # Different formats of site codes commonly found in cancer registries
  icd_o3_site = c("C50.1", "C701", "620"),
  icd_o3_behaviour = c("3", "3", "3")
)

head(data)
#>   icd_o3_histology icd_o3_site icd_o3_behaviour
#> 1             8522       C50.1                3
#> 2             9490        C701                3
#> 3             9070         620                3

Convert cancer site

The site_convert() function can be used to extract the correct site (a.k.a. topography) codes and convert them to a standardized numeric format. It is designed to handle both character and numeric input and will automatically detect if the codes are in decimal (“C34.1”) or integer (“C341”) format and convert them.


# Convert site codes
data$site_conv <- site_convert(data$icd_o3_site, validate = FALSE)

head(data)
#>   icd_o3_histology icd_o3_site icd_o3_behaviour site_conv
#> 1             8522       C50.1                3       501
#> 2             9490        C701                3       701
#> 3             9070         620                3       620

site_convert() also has built-in validation to ensure that the site codes have the correct numeric values ranging from “C00.0” to “C97.9”. This can be called by specifying the validate argument as TRUE.


# Valid site codes
site_convert("C34.1", validate = TRUE)
#> [1] 341

# Invalid site codes
site_convert("C99.9", validate = TRUE) # Should return NA and an warning message
#> Warning in site_convert("C99.9", validate = TRUE): There were 1 invalid ICD-O-3
#> site codes found and set to NA.
#> [1] NA
site_convert("C99.9", validate = FALSE) # Should return 999
#> [1] 999

Classify adolescent and young adult cancers

The aya_class() function can be used to classify adolescent and young adult cancer based on the histology, site, and behaviour codes of the cancer.

The method used for the classification can be specified using one of the method arguments specified below:

Users can also specify the depth of the classification tree using the depth argument. The depth parameter specifies the maximum depth of the classification tree, with 1 being the highest level of classification and most general grouping.


# Classify AYA cancers using Barr 2020 classification

# Classify at level 1 (most general)
data$dx_lvl_1 <- aya_class(data$icd_o3_histology, data$icd_o3_site, data$icd_o3_behaviour, depth = 1)

# Add more granular classifications
data$dx_lvl_2 <- aya_class(
  histology = data$icd_o3_histology, 
  site = data$site_conv, 
  behaviour = data$icd_o3_behaviour, 
  depth = 2
)

# Add even more granular classifications (level 3) using SEER 2020 revision classification
data$dx_lvl_3 <- aya_class(
  histology = data$icd_o3_histology, 
  site = site_convert(data$icd_o3_site), # Convert site codes using site_convert()
  behaviour = data$icd_o3_behaviour,
  method = "SEER v2020",
  depth = 3
)

# View created columns
print(data[, c("dx_lvl_1", "dx_lvl_2", "dx_lvl_3")])
#>                                                  dx_lvl_1
#> 1                                           9. Carcinomas
#> 2 3. CNS and other intracranial and intraspinal neoplasms
#> 3                           7. Gonadal and related tumors
#>                             dx_lvl_2
#> 1            9.6 Carcinoma of breast
#> 2 3.3 Neuroblastomas/ganglioneuromas
#> 3                         7.1 Testis
#>                                              dx_lvl_3
#> 1                    9.6.1 Breast - infiltrating duct
#> 2 3.3.2 Neuroblastoma/ganglioneuroblastoma - invasive
#> 3                   7.1.1 Germ cell and trophoblastic

Classify childhood cancers

Similarly, the kid_class() function can be used to classify childhood cancers.

The method used for the classification can be specified using one of the method arguments specified below:


# Make example data

data_kid <- data.frame(
  histology = c("8522", "9490", "9070"),
  site = c("C50.1", "C701", "620"),
  behaviour = c("3", "3", "3")
)

# Classify childhood cancers using ICCC-3 classification
data_kid$dx_lvl_1 <- kid_class(data_kid$histology, data_kid$site, depth = 1) # ICCC-3
data_kid$dx_lvl_1.seer <- kid_class(data_kid$histology, data_kid$site, method = "who-iccc3", depth = 1) # WHO-SEER recode

# Add SEER grouping column
data_kid$seer_grp <- kid_class(data_kid$histology, data_kid$site, depth = 99)

# View results
head(data_kid)
#>   histology  site behaviour
#> 1      8522 C50.1         3
#> 2      9490  C701         3
#> 3      9070   620         3
#>                                                             dx_lvl_1
#> 1   XI. Other malignant epithelial neoplasms and malignant melanomas
#> 2         IV. Neuroblastoma and other peripheral nervous cell tumors
#> 3 X. Germ cell tumors, trophoblastic tumors, and neoplasms of gonads
#>                                                        dx_lvl_1.seer seer_grp
#> 1   XI. Other malignant epithelial neoplasms and malignant melanomas      102
#> 2         IV. Neuroblastoma and other peripheral nervous cell tumors       33
#> 3 X. Germ cell tumors, trophoblastic tumors, and neoplasms of gonads       85