Processing math: 100%

CINmetrics

Vishal H. Oza

2-Dec-2020

CINmetrics

The goal of CINmetrics package is to provide different methods of calculating Chromosomal Instability (CIN) metrics from the literature that can be applied to any cancer data set including The Cancer Genome Atlas.

library(CINmetrics)

The dataset provided with CINmetrics package is masked Copy Number variation data for Breast Cancer for 10 unique samples selected randomly from TCGA.

dim(maskCNV_BRCA)
#> [1] 1650    7

Alternatively, you can download the entire dataset from TCGA using TCGAbiolinks package

## Not run:
#library(TCGAbiolinks)
#query.maskCNV.hg39.BRCA <- GDCquery(project = "TCGA-BRCA",
#              data.category = "Copy Number Variation",
#              data.type = "Masked Copy Number Segment", legacy=FALSE)
#GDCdownload(query = query.maskCNV.hg39.BRCA)
#maskCNV.BRCA <- GDCprepare(query = query.maskCNV.hg39.BRCA, summarizedExperiment = FALSE)
#maskCNV.BRCA <- data.frame(maskCNV.BRCA, stringsAsFactors = FALSE)
#tai.test <- tai(cnvData = maskCNV.BRCA)
## End(Not run)

Total Aberration Index

tai calculates the Total Aberration Index (TAI; Baumbusch LO, et. al.), “a measure of the abundance of genomic size of copy number changes in a tumour”. It is defined as a weighted sum of the segment means (|ˉySi|).

Biologically, it can also be interpreted as the absolute deviation from the normal copy number state averaged over all genomic locations.

Total Aberration Index=Ri=1di|ˉySi|Ri=1di  where|ˉySi||log21.7|

tai.test <- tai(cnvData = maskCNV_BRCA)
head(tai.test)
#>                      sample_id       tai
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01 0.4574789
#> 2 TCGA-E2-A153-11A-31D-A12A-01 1.4916264
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01 0.9886191
#> 4 TCGA-A2-A0YD-01A-11D-A107-01 0.4944296
#> 5 TCGA-BH-A0BR-01A-21D-A111-01 0.3531782
#> 6 TCGA-D8-A27T-01A-11D-A16C-01 0.3706400

Modified Total Aberration Index

taiModified calculates a modified Total Aberration Index using all sample values instead of those in aberrant copy number state, thus does not remove the directionality from the score.

Modified Total Aberration Index=Ri=1diˉySiRi=1di

modified.tai.test <- taiModified(cnvData = maskCNV_BRCA)
head(modified.tai.test)
#>                      sample_id modified_tai
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01  0.014579640
#> 2 TCGA-E2-A153-11A-31D-A12A-01  0.012139011
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01  0.015385256
#> 4 TCGA-A2-A0YD-01A-11D-A107-01  0.006692841
#> 5 TCGA-BH-A0BR-01A-21D-A111-01  0.004983911
#> 6 TCGA-D8-A27T-01A-11D-A16C-01  0.014940306

Copy Number Aberration

cna calculates the total number of copy number aberrations (CNA; Davidson JM, et. al.), defined as a segment with copy number outside the pre-defined range of 1.7-2.3 ((log21.71)ˉySi(log22.31)) that is not contiguous with an adjacent independent CNA of identical copy number. For our purposes, we have adapted the range to be |ˉySi||log21.7|, which is only slightly larger than the original.

This metric is very similar to the number of break points, but it comes with the caveat that adjacent segments need to have a difference in segmentation mean values.

Total Copy Number Aberration=Ri=1ni  where  |ˉySi||log21.7|,|ˉySi1ˉySi|0.2,di10

cna.test <- cna(cnvData = maskCNV_BRCA)
head(cna.test)
#>                      sample_id cna
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01  33
#> 2 TCGA-E2-A153-11A-31D-A12A-01  14
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01   7
#> 4 TCGA-A2-A0YD-01A-11D-A107-01  14
#> 5 TCGA-BH-A0BR-01A-21D-A111-01 212
#> 6 TCGA-D8-A27T-01A-11D-A16C-01  31

Counting Altered Base segments

countingBaseSegments calculates the number of altered bases defined as the sums of the lengths of segments (di) with an absolute segment mean (|ˉySi|) of greater than 0.2.

Biologically, this value can be thought to quantify numerical chromosomal instability. This is also a simpler representation of how much of the genome has been altered, and it does not run into the issue of sequencing coverage affecting the fraction of the genome altered.

Number of Altered Bases=Ri=1di where |ˉySi|0.2

base.seg.test <- countingBaseSegments(cnvData = maskCNV_BRCA)
head(base.seg.test)
#>                      sample_id base_segments
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01      55853059
#> 2 TCGA-E2-A153-11A-31D-A12A-01        131157
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01         80000
#> 4 TCGA-A2-A0YD-01A-11D-A107-01     271941966
#> 5 TCGA-BH-A0BR-01A-21D-A111-01    1314597331
#> 6 TCGA-D8-A27T-01A-11D-A16C-01     536984944

Counting Number of Break Points

countingBreakPoints calculates the number of break points defined as the number of segments (ni) with an absolute segment mean greater than 0.2. This is then doubled to account for the 5’ and 3’ break points.

Biologically, this value can be thought to quantify structural chromosomal instability.

Number of Break Points=Ri=1(ni2) where |ˉySi|0.2

break.points.test <- countingBreakPoints(cnvData = maskCNV_BRCA)
head(break.points.test)
#>                      sample_id break_points
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01          104
#> 2 TCGA-E2-A153-11A-31D-A12A-01           40
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01           22
#> 4 TCGA-A2-A0YD-01A-11D-A107-01           40
#> 5 TCGA-BH-A0BR-01A-21D-A111-01          626
#> 6 TCGA-D8-A27T-01A-11D-A16C-01          102

Fraction of Genome Altered

fga calculates the fraction of the genome altered (FGA; Chin SF, et. al.), measured by taking the sum of the number of bases altered and dividing it by the genome length covered (G). Genome length covered was calculated by summing the lengths of each probe on the Affeymetrix 6.0 array. This calculation excludes sex chromosomes.

Fraction Genome Altered=Ri=1diG  where |ˉySi|0.2

fraction.genome.test <- fga(cnvData = maskCNV_BRCA)
head(fraction.genome.test)
#>                      sample_id          fga
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01 1.943930e-02
#> 2 TCGA-E2-A153-11A-31D-A12A-01 4.564835e-05
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01 2.784349e-05
#> 4 TCGA-A2-A0YD-01A-11D-A107-01 9.464765e-02
#> 5 TCGA-BH-A0BR-01A-21D-A111-01 4.126128e-01
#> 6 TCGA-D8-A27T-01A-11D-A16C-01 1.868942e-01

CINmetrics

CINmetrics calculates tai, cna, number of altered base segments, number of break points, and fraction of genome altered and returns them as a single data frame.

cinmetrics.test <- CINmetrics(cnvData = maskCNV_BRCA)
head(cinmetrics.test)
#>                      sample_id       tai cna base_segments break_points
#> 1 TCGA-A2-A0YD-01A-11D-A107-01 0.4944296  14     271941966           40
#> 2 TCGA-A8-A086-01A-11D-A011-01 0.6721224  70     805881366          214
#> 3 TCGA-AO-A0J5-10A-01D-A037-01 0.8889885  12         41816           34
#> 4 TCGA-AR-A0TV-01A-21D-A087-01 0.5861162 187    1099228749          624
#> 5 TCGA-B6-A0RP-01A-21D-A087-01 0.3184316  41    1291153635          142
#> 6 TCGA-BH-A0BR-01A-21D-A111-01 0.3531782 212    1314597331          626
#>            fga
#> 1 9.464765e-02
#> 2 2.804818e-01
#> 3 1.455379e-05
#> 4 3.825795e-01
#> 5 4.493777e-01
#> 6 4.126128e-01