Skip to content

Latest commit

 

History

History

data

Data

CRISPR/Cas9 screens

The Achilles and Project SCORE CRISPR/Cas9 screen data were downloaded from the DepMap data portal.

To get the Cancer Gene Census data from Sanger, you will need to set the environment variables SANGER_EMAIL and SANGER_PASS with the email address and password used to register on their data portal. If using a conda environment, follow these instructions to have the key set automatically when the environment with activated (and unset when deactivated).

The data can be downloaded using the following command (from the root directory of the project).

conda activate speclet_smk
make download_data

Notes

Below are some notes on the data for future reference.

Copy number

Below is a description of the copy number data values from a post on the DepMap community forum:

Since we do not have matched normals, the output is a “copy ratio” or relative copy number. It is relative to the rest of the genome for that cell line. E.g. if the cell line is tetraploid we would not be able to see it from the relative copy number. These values are reported as log2(relative CN + 1) in the portal.

Therefore, to get the original relative copy number values, use the following transformation: cn = (2^x) - 1. To be clear, the average value of the relative copy number is 1.

mRNA expression

Gene expression TPM values of the protein coding genes for DepMap cell lines. Values are inferred from RNA-seq data using the RSEM tool and are reported after $\log_2$ transformation, using a pseudo-count of 1; $\log_2(\text{TPM}+1)$.