The Achilles and Project SCORE CRISPR/Cas9 screen data were downloaded from the DepMap data portal.
To get the Cancer Gene Census data from Sanger, you will need to set the environment variables SANGER_EMAIL
and SANGER_PASS
with the email address and password used to register on their data portal.
If using a conda environment, follow these instructions to have the key set automatically when the environment with activated (and unset when deactivated).
The data can be downloaded using the following command (from the root directory of the project).
conda activate speclet_smk
make download_data
Below are some notes on the data for future reference.
Below is a description of the copy number data values from a post on the DepMap community forum:
Since we do not have matched normals, the output is a “copy ratio” or relative copy number. It is relative to the rest of the genome for that cell line. E.g. if the cell line is tetraploid we would not be able to see it from the relative copy number. These values are reported as log2(relative CN + 1) in the portal.
Therefore, to get the original relative copy number values, use the following transformation: cn = (2^x) - 1
.
To be clear, the average value of the relative copy number is 1.
Gene expression TPM values of the protein coding genes for DepMap cell lines. Values are inferred from RNA-seq data using the RSEM tool and are reported after
$\log_2$ transformation, using a pseudo-count of 1;$\log_2(\text{TPM}+1)$ .