Code for reproducing the figures and results in the preprint Accurate quantification of single-nucleus and single-cell RNA-seq transcripts by Kristján Eldjárn Hjörleifsson, Delaney Sullivan, Nikhila Swarna, Conrad Oakes, Guillaume Holley, Páll Melsted and Lior Pachter
(Note: In this repo, D-list is often referred to as "offlist".)
The human reference genome (FASTA+GTF) used in all analyses is available directly at https://github.com/pachterlab/SHSOHMP_2024/releases under the filename human_CR_3.0.0.tar.gz.
Please follow the steps below in order to reproduce the results of the preprint. Set all the paths to be relative to the directory SHSOHMP_2024.
main_path="$(pwd)/SHSOHMP_2024" kallisto="$main_path/kallisto_0.48.0/kallisto" kallisto="$main_path/kallisto_0.50.0/kallisto" kallisto="$main_path/kallisto_0.50.1/kallisto" bustools="$main_path/bustools/build/src/bustools" cellranger7="$main_path/cellranger/cellranger-7.0.1/cellranger" salmon="$main_path/salmon-latest_linux_x86_64/bin/salmon"
version 0.48.0
cd $main_path wget https://github.com/pachterlab/kallisto/releases/download/v0.48.0/kallisto_linux-v0.48.0.tar.gz tar -xzvf kallisto_linux-v0.48.0.tar.gz mv kallisto kallisto_0.48.0
version 0.50.0
cd $main_path wget https://github.com/pachterlab/kallisto/releases/download/v0.50.0/kallisto_linux-v0.50.0.tar.gz tar -xzvf kallisto_linux-v0.50.0.tar.gz mv kallisto kallisto_0.50.0
version 0.50.1
cd $main_path wget https://github.com/pachterlab/kallisto/releases/download/v0.50.1/kallisto_linux-v0.50.1.tar.gz tar -xzvf kallisto_linux-v0.50.1.tar.gz mv kallisto kallisto_0.50.1
version 0.43.2
cd $main_path rm -rf bustools git clone -b v0.43.2 https://github.com/BUStools/bustools cd bustools && mkdir -p build && cd build cmake .. && make
version 0.28.0
cd $main_path yes|python -m pip uninstall kb-python python -m pip install kb_python==0.28.0
Note: Cell Ranger needs to be installed manually. Version is as follows:
- Cell Ranger v7.0.1 (Released August 18, 2022. Downloaded October 7, 2022)
salmon version 1.10.0; alevin-fry version 0.8.2; pyroe 0.9.3; simpleaf 0.15.1
cd $main_path wget https://github.com/COMBINE-lab/salmon/releases/download/v1.10.0/salmon-1.10.0_linux_x86_64.tar.gz && tar -xzvf salmon-1.10.0_linux_x86_64.tar.gz export RUSTUP_HOME=${main_path}/.rustup/ export CARGO_HOME=${main_path}/.cargo/ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh ./.cargo/bin/cargo install --version 0.8.2 --force alevin-fry ./.cargo/bin/cargo install --version 0.15.1 --force simpleaf yes|python -m pip uninstall pyroe python -m pip install pyroe==0.9.3
simpleaf configuration:
export ALEVIN_FRY_HOME="$main_path/af_home" simpleaf set-paths \ --salmon $(pwd)/salmon-latest_linux_x86_64/bin/salmon simpleaf workflow get --name 10x-chromium-3p-v3 -o af10xv3
Open up an R session and then run:
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("Rsubread")
Note: Version Rsubread_2.12.3
Navigate to STARsoloManuscript and run the scripts there
Note: Make sure to run the STARsoloManuscript scripts first before proceeding (we use these indices and the links to the program binary files downstream). At a minimum, complete the sections "Create symlinks to executables", "Create indices", and "Mouse genome prep".
wget https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-exp/6.1.0/20k_PBMC_3p_HT_nextgem_Chromium_X/20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs.tar tar -xvf 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs.tar
wget https://cf.10xgenomics.com/samples/cell-exp/7.0.0/5k_human_jejunum_CNIK_3pv3/5k_human_jejunum_CNIK_3pv3_fastqs.tar tar -xvf 5k_human_jejunum_CNIK_3pv3_fastqs.tar
wget https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-exp/4.0.0/SC3_v3_NextGem_SI_Neuron_10K/SC3_v3_NextGem_SI_Neuron_10K_fastqs.tar tar -xvf SC3_v3_NextGem_SI_Neuron_10K_fastqs.tar
wget https://s3-us-west-2.amazonaws.com/10x.files/samples/cell-exp/7.0.0/5k_mouse_lung_CNIK_3pv3/5k_mouse_lung_CNIK_3pv3_fastqs.tar tar -xvf 5k_mouse_lung_CNIK_3pv3_fastqs.tar
wget https://s3-us-west-2.amazonaws.com/10x.files/samples/spatial-exp/2.1.0/CytAssist_11mm_FFPE_Mouse_Embryo/CytAssist_11mm_FFPE_Mouse_Embryo_fastqs.tar tar -xf CytAssist_11mm_FFPE_Mouse_Embryo_fastqs.tar && rm CytAssist_11mm_FFPE_Mouse_Embryo_fastqs.tar && mv fastqs/* ./ && rmdir fastqs
kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t 20 -x 10XV3 \ --workflow nac --sum=total -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/c2 \ -o ./matrices_human_20k_PBMC/ --overwrite --verbose \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R2_001.fastq.gz
kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t 20 -x 10XV3 \ --workflow nac --sum=total -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/c2 \ -o ./matrices_human_5k_jejunum_nuclei/ --overwrite --verbose \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L001_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L001_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L002_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L002_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L003_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L003_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L004_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L004_R2_001.fastq.gz
kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t 20 -x 10XV3 \ --workflow nac --sum=total -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c2 \ -o ./matrices_mouse_10k_neuron/ --overwrite --verbose \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L002_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L002_R2_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L003_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L003_R2_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L004_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L004_R2_001.fastq.gz
kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t 20 -x 10XV3 \ --workflow nac --sum=total -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c2 \ -o ./matrices_mouse_5k_lung/ --overwrite --verbose \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L001_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L001_R2_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L002_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L002_R2_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L003_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L003_R2_001.fastq.gz
kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t 20 -x VISIUM \ --strand=unstranded --workflow nac --sum=total -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c2 \ -o ./matrices_mouse_ffpe/ --overwrite --verbose \ CytAssist_11mm_FFPE_Mouse_Embryo_fastqs/CytAssist_11mm_FFPE_Mouse_Embryo_S1_L004_R1_001.fastq.gz \ CytAssist_11mm_FFPE_Mouse_Embryo_fastqs/CytAssist_11mm_FFPE_Mouse_Embryo_S1_L004_R2_001.fastq.gz
Filtering for UMI threshold >= 500 (applies to total count matrix; the other count matrices just use the barcodes from the "total" matrix).
./filter.sh matrices_human_20k_PBMC 500 ./filter.sh matrices_human_5k_jejunum_nuclei 500 ./filter.sh matrices_mouse_10k_neuron 500 ./filter.sh matrices_mouse_5k_lung 500 ./filter.sh matrices_mouse_ffpe 500
Let's now use the script from the simulations (where we compared output matrix vs simulated truth matrix) to now compare our nascent/mature/ambiguous/etc. matrices. Everything is in the mtx_comparisons.sh file.
./mtx_comparisons.sh matrices_human_20k_PBMC ./mtx_comparisons.sh matrices_human_5k_jejunum_nuclei ./mtx_comparisons.sh matrices_mouse_10k_neuron ./mtx_comparisons.sh matrices_mouse_5k_lung ./mtx_comparisons.sh matrices_mouse_ffpe
The final analysis is produced in the matrix_comparisons.ipynb python notebook file.
Note: kb-python already uses the 10xv3 prepackaged on-list.
Note: After the following commands are run, the analysis_dlist_performance.ipynb python notebook contains the final plots.
mkdir -p performance_comparisons/out/
nac + offlist:
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow nac -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/c2 \ -o ./performance_comparisons/out/nac_offlist-20kb_PBMC/ --overwrite \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R2_001.fastq.gz" /usr/bin/time -v $cmd1 16 $cmd2 2> performance_comparisons/16_nac_offlist-20kb_PBMC_1.txt
nac (no offlist):
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow nac -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_1/c2 \ -o ./performance_comparisons/out/nac-20kb_PBMC/ --overwrite \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R2_001.fastq.gz" /usr/bin/time -v $cmd1 16 $cmd2 2> performance_comparisons/16_nac-20kb_PBMC_1.txt
standard + offlist:
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow standard -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/standard_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/standard_offlist_1/g \ -o ./performance_comparisons/out/standard_offlist-20kb_PBMC/ --overwrite \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R2_001.fastq.gz" /usr/bin/time -v $cmd1 16 $cmd2 2> performance_comparisons/16_standard_offlist-20kb_PBMC_1.txt
standard (no offlist):
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow standard -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/standard_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/standard_1/g \ -o ./performance_comparisons/out/standard-20kb_PBMC/ --overwrite \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R2_001.fastq.gz" /usr/bin/time -v $cmd1 16 $cmd2 2> performance_comparisons/16_standard-20kb_PBMC_1.txt
nac + offlist:
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow nac -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/c2 \ -o ./performance_comparisons/out/nac_offlist-5kb_jejunum/ --overwrite \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L001_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L001_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L002_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L002_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L003_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L003_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L004_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L004_R2_001.fastq.gz" /usr/bin/time -v $cmd1 16 $cmd2 2> performance_comparisons/16_nac_offlist-5kb_jejunum_1.txt
nac (no offlist):
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow nac -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_1/c2 \ -o ./performance_comparisons/out/nac-5kb_jejunum/ --overwrite \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L001_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L001_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L002_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L002_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L003_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L003_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L004_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L004_R2_001.fastq.gz" /usr/bin/time -v $cmd1 16 $cmd2 2> performance_comparisons/16_nac-5kb_jejunum_1.txt
standard + offlist:
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow standard -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/standard_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/standard_offlist_1/g \ -o ./performance_comparisons/out/standard_offlist-5kb_jejunum/ --overwrite \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L001_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L001_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L002_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L002_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L003_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L003_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L004_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L004_R2_001.fastq.gz" $cmd1 16 $cmd2
standard (no offlist):
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow standard -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/standard_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/standard_1/g \ -o ./performance_comparisons/out/standard-5kb_jejunum/ --overwrite \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L001_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L001_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L002_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L002_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L003_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L003_R2_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L004_R1_001.fastq.gz \ 5k_human_jejunum_CNIK_3pv3_fastqs/5k_human_jejunum_CNIK_3pv3_S1_L004_R2_001.fastq.gz" $cmd1 16 $cmd2
nac + offlist:
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow nac -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c2 \ -o ./performance_comparisons/out/nac_offlist-10kb_neuron/ --overwrite \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L002_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L002_R2_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L003_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L003_R2_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L004_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L004_R2_001.fastq.gz" /usr/bin/time -v $cmd1 16 $cmd2 2> performance_comparisons/16_nac_offlist-10kb_neuron_1.txt
nac (no offlist):
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow nac -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_1/c2 \ -o ./performance_comparisons/out/nac-10kb_neuron/ --overwrite \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L002_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L002_R2_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L003_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L003_R2_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L004_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L004_R2_001.fastq.gz" /usr/bin/time -v $cmd1 16 $cmd2 2> performance_comparisons/16_nac-10kb_neuron_1.txt
standard + offlist:
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow standard -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/standard_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/standard_offlist_1/g \ -o ./performance_comparisons/out/standard_offlist-10kb_neuron/ --overwrite \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L002_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L002_R2_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L003_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L003_R2_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L004_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L004_R2_001.fastq.gz" /usr/bin/time -v $cmd1 16 $cmd2 2> performance_comparisons/16_standard_offlist-10kb_neuron_1.txt
standard (no offlist):
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow standard -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/standard_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/standard_1/g \ -o ./performance_comparisons/out/standard-10kb_neuron/ --overwrite \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L002_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L002_R2_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L003_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L003_R2_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L004_R1_001.fastq.gz \ SC3_v3_NextGem_SI_Neuron_10K_fastqs/SC3_v3_NextGem_SI_Neuron_10K_S1_L004_R2_001.fastq.gz" /usr/bin/time -v $cmd1 16 $cmd2 2> performance_comparisons/16_standard-10kb_neuron_1.txt
nac + offlist:
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow nac -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c2 \ -o ./performance_comparisons/out/nac_offlist-5kb_lung/ --overwrite \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L001_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L001_R2_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L002_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L002_R2_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L003_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L003_R2_001.fastq.gz" /usr/bin/time -v $cmd1 16 $cmd2 2> performance_comparisons/16_nac_offlist-5kb_lung_1.txt
nac (no offlist):
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow nac -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_1/c2 \ -o ./performance_comparisons/out/nac-5kb_lung/ --overwrite \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L001_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L001_R2_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L002_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L002_R2_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L003_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L003_R2_001.fastq.gz" /usr/bin/time -v $cmd1 16 $cmd2 2> performance_comparisons/16_nac-5kb_lung_1.txt
standard + offlist:
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow standard -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/standard_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/standard_offlist_1/g \ -o ./performance_comparisons/out/standard_offlist-5kb_lung/ --overwrite \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L001_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L001_R2_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L002_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L002_R2_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L003_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L003_R2_001.fastq.gz" $cmd1 16 $cmd2
standard (no offlist):
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 \ --workflow standard -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/standard_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/standard_1/g \ -o ./performance_comparisons/out/standard-5kb_lung/ --overwrite \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L001_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L001_R2_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L002_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L002_R2_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L003_R1_001.fastq.gz \ 5k_mouse_lung_CNIK_3pv3_fastqs/5k_mouse_lung_CNIK_3pv3_S4_L003_R2_001.fastq.gz" $cmd1 16 $cmd2
nac + offlist:
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 --strand=unstranded \ --workflow nac -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c2 \ -o ./performance_comparisons/out/nac_offlist-mouse_ffpe/ --overwrite \ CytAssist_11mm_FFPE_Mouse_Embryo_fastqs/CytAssist_11mm_FFPE_Mouse_Embryo_S1_L004_R1_001.fastq.gz \ CytAssist_11mm_FFPE_Mouse_Embryo_fastqs/CytAssist_11mm_FFPE_Mouse_Embryo_S1_L004_R2_001.fastq.gz" $cmd1 16 $cmd2
nac (no offlist):
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 --strand=unstranded \ --workflow nac -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_1/c2 \ -o ./performance_comparisons/out/nac-mouse_ffpe/ --overwrite \ CytAssist_11mm_FFPE_Mouse_Embryo_fastqs/CytAssist_11mm_FFPE_Mouse_Embryo_S1_L004_R1_001.fastq.gz \ CytAssist_11mm_FFPE_Mouse_Embryo_fastqs/CytAssist_11mm_FFPE_Mouse_Embryo_S1_L004_R2_001.fastq.gz" $cmd1 16 $cmd2
standard + offlist:
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 --strand=unstranded \ --workflow standard -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/standard_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/standard_offlist_1/g \ -o ./performance_comparisons/out/standard_offlist-mouse_ffpe/ --overwrite \ CytAssist_11mm_FFPE_Mouse_Embryo_fastqs/CytAssist_11mm_FFPE_Mouse_Embryo_S1_L004_R1_001.fastq.gz \ CytAssist_11mm_FFPE_Mouse_Embryo_fastqs/CytAssist_11mm_FFPE_Mouse_Embryo_S1_L004_R2_001.fastq.gz" $cmd1 16 $cmd2
standard (no offlist):
cmd1="kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t " cmd2=" -x 10XV3 --strand=unstranded \ --workflow standard -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/standard_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/standard_1/g \ -o ./performance_comparisons/out/standard-mouse_ffpe/ --overwrite \ CytAssist_11mm_FFPE_Mouse_Embryo_fastqs/CytAssist_11mm_FFPE_Mouse_Embryo_S1_L004_R1_001.fastq.gz \ CytAssist_11mm_FFPE_Mouse_Embryo_fastqs/CytAssist_11mm_FFPE_Mouse_Embryo_S1_L004_R2_001.fastq.gz" $cmd1 16 $cmd2
Get clusters 1 and 2:
wget --continue https://cf.10xgenomics.com/samples/cell-exp/6.1.0/20k_PBMC_3p_HT_nextgem_Chromium_X/20k_PBMC_3p_HT_nextgem_Chromium_X_analysis.tar.gz tar -xzvf 20k_PBMC_3p_HT_nextgem_Chromium_X_analysis.tar.gz cat analysis/clustering/graphclust/clusters.csv|cut -d"-" -f1|tail -n+2 > barcodes_10x_human_all.txt
kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t 48 -x 10XV3 \ --workflow nac --sum=total -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_offlist_1/c2 \ -o ./reprocess_human_20k_PBMC/ --overwrite --verbose \ -w barcodes_10x_human_all.txt -t 48 \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R2_001.fastq.gz
kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t 48 -x 10XV3 \ --workflow nac --sum=total -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_1/g \ -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/nac_1/c2 \ -o ./reprocess_human_20k_PBMC_no_offlist/ --overwrite --verbose \ -w barcodes_10x_human_all.txt \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R2_001.fastq.gz
kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t 48 -x 10XV3 \ -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/standard_offlist_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/standard_offlist_1/g \ -o ./reprocess_human_20k_PBMC_standard/ --overwrite --verbose \ -w barcodes_10x_human_all.txt \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R2_001.fastq.gz
kb count --kallisto STARsoloManuscript/exe/kallisto_0.50.1 --bustools STARsoloManuscript/exe/bustools_0.43.2 -t 48 -x 10XV3 \ -i STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/standard_1/index.idx \ -g STARsoloManuscript/genomes/index/kallisto_0.50.1/human_CR_3.0.0/standard_1/g \ -o ./reprocess_human_20k_PBMC_standard_no_offlist/ --overwrite --verbose \ -w barcodes_10x_human_all.txt \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L001_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L002_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L003_R2_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R1_001.fastq.gz \ 20k_PBMC_3p_HT_nextgem_Chromium_X_fastqs/20k_PBMC_3p_HT_nextgem_Chromium_X_S3_L004_R2_001.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/065/SRR13948565/SRR13948565_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/065/SRR13948565/SRR13948565_2.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/066/SRR13948566/SRR13948566_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/066/SRR13948566/SRR13948566_2.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/071/SRR13948571/SRR13948571_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/071/SRR13948571/SRR13948571_2.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/068/SRR13948568/SRR13948568_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/068/SRR13948568/SRR13948568_2.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/069/SRR13948569/SRR13948569_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/069/SRR13948569/SRR13948569_2.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/070/SRR13948570/SRR13948570_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/070/SRR13948570/SRR13948570_2.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/067/SRR13948567/SRR13948567_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/067/SRR13948567/SRR13948567_2.fastq.gz
wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5169nnn/GSM5169184/suppl/GSM5169184%5FC2C12%5Fshort%5F1k%5Fcell%5Fmetadata.csv.gz wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5169nnn/GSM5169185/suppl/GSM5169185%5FC2C12%5Fshort%5F9kA%5Fcell%5Fmetadata.csv.gz wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5169nnn/GSM5169186/suppl/GSM5169186%5FC2C12%5Fshort%5F9kB%5Fcell%5Fmetadata.csv.gz wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5169nnn/GSM5169187/suppl/GSM5169187%5FC2C12%5Fshort%5F9kC%5Fcell%5Fmetadata.csv.gz wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5169nnn/GSM5169188/suppl/GSM5169188%5FC2C12%5Fshort%5F9kD%5Fcell%5Fmetadata.csv.gz wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5169nnn/GSM5169189/suppl/GSM5169189%5FC2C12%5Fshort%5F9kE%5Fcell%5Fmetadata.csv.gz wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5169nnn/GSM5169190/suppl/GSM5169190%5FC2C12%5Fshort%5F9kF%5Fcell%5Fmetadata.csv.gz wget https://raw.githubusercontent.com/pachterlab/splitcode-tutorial/main/uploads/splitseq/r2_r3.txt wget https://raw.githubusercontent.com/fairliereese/LR-splitpipe/859279ed3fec859248fb4fdaee17280e6103b9f9/barcodes/bc_data_v2.csv
cat bc_data_v2.csv|grep "A1\|A2\|A3\|A4\|A5\|A6\|A7\|A8\|A9\|A10\|A11\|A12"|grep R$|cut -d, -f2 > r1_R_Awells.txt cat bc_data_v2.csv|grep "A1\|A2\|A3\|A4\|A5\|A6\|A7\|A8\|A9\|A10\|A11\|A12"|grep T$|cut -d, -f2 > r1_T_Awells.txt
rm splitseq_batch.txt ./prep_splitseq.sh SRR13948565 GSM5169184_C2C12_short_1k ./prep_splitseq.sh SRR13948566 GSM5169185_C2C12_short_9kA ./prep_splitseq.sh SRR13948567 GSM5169186_C2C12_short_9kB ./prep_splitseq.sh SRR13948568 GSM5169187_C2C12_short_9kC ./prep_splitseq.sh SRR13948569 GSM5169188_C2C12_short_9kD ./prep_splitseq.sh SRR13948570 GSM5169189_C2C12_short_9kE ./prep_splitseq.sh SRR13948571 GSM5169190_C2C12_short_9kF
Need bowtie2 (version 2.5.3), seqkit (v2.8.0), samtools (version 1.19.2)
bowtie2-build "mm10_ncRNA.fa" "exclusion_index" cat splitseq_batch.txt|cut -d' ' -f2 > splitseq_batch.r1.txt cat splitseq_batch.txt|cut -d' ' -f3 > splitseq_batch.r2.txt xargs -I {} sh -c 'bowtie2 -q -p 20 \ --no-unal \ --quiet \ --local \ -x "exclusion_index" \ -U "{}" | samtools view -S | cut -f1 > "{}.filter.txt"' < splitseq_batch.r1.txt
cat *.filter.txt > final.filter.txt xargs -I {} sh -c 'seqkit \ grep -j 20 -v -n \ -f "final.filter.txt" "{}" \ -o "{}.filtered.fastq.gz"' < splitseq_batch.r1.txt xargs -I {} sh -c 'seqkit \ grep -j 20 -v -n \ -f "final.filter.txt" "{}" \ -o "{}.filtered.fastq.gz"' < splitseq_batch.r2.txt
cat splitseq_batch.txt | sed 's/\.fastq\.gz/.fastq.gz.filtered.fastq.gz/g' > splitseq_batch_final.txt
rm -rf splitseq_out kb count --strand=forward -w None --overwrite --keep-tmp --verbose \ --workflow=nac -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/index.idx -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/g -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c2 -x 1,10,18,1,48,56,1,78,86:1,0,10:0,0,0 \ --sum=total -o splitseq_out --batch-barcodes splitseq_batch_final.txt STARsoloManuscript/exe/bustools_0.43.2 count -o splitseq_out/counts_unfiltered/cells_x_tcc -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/g -e splitseq_out/matrix.ec -t splitseq_out/transcripts.txt --multimapping --umi-gene splitseq_out/tmp/output.s.bus STARsoloManuscript/exe/kallisto_0.50.1 quant-tcc -b 10 -o splitseq_out/quant_unfiltered/ -t 24 --matrix-to-files --plaintext -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/index.idx -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/g -e splitseq_out/counts_unfiltered/cells_x_tcc.ec.txt splitseq_out/counts_unfiltered/cells_x_tcc.mtx
Now, look in the splitseq_analysis.ipynb notebook for further analysis.
rm -rf splitseq_out_supplement cat splitseq_batch_final.txt|cut -c3- > splitseq_batch_final.modified.txt awk 'NR==FNR{a[NR]=$0; next} {print a[FNR] " *" $0}' r1_R_Awells.txt r1_T_Awells.txt > replace.txt kb count --strand=forward -w None --overwrite --keep-tmp --verbose \ --workflow=nac -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/index.idx -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/g -c1 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c1 \ -c2 STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/c2 -x 1,10,18,1,48,56,1,78,86:1,0,10:0,0,0 \ --sum=total -o splitseq_out_supplement -r replace.txt --batch-barcodes splitseq_batch_final.modified.txt STARsoloManuscript/exe/bustools_0.43.2 count -o splitseq_out_supplement/counts_unfiltered_modified/cells_x_tcc -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/g -e splitseq_out_supplement/matrix.ec -t splitseq_out_supplement/transcripts.txt --multimapping --umi-gene splitseq_out_supplement/output_modified.unfiltered.bus STARsoloManuscript/exe/kallisto_0.50.1 quant-tcc -o splitseq_out_supplement/quant_unfiltered/ -t 24 --matrix-to-files --plaintext -i STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/index.idx -g STARsoloManuscript/genomes/index/kallisto_0.50.1/mouse/nac_offlist_1/g -e splitseq_out_supplement/counts_unfiltered_modified/cells_x_tcc.ec.txt splitseq_out_supplement/counts_unfiltered_modified/cells_x_tcc.mtx
See supp_quant.ipynb for plotting.
zcat splitseq_R_SRR13948565_R1.fastq.gz.filtered.fastq.gz splitseq_R_SRR13948566_R1.fastq.gz.filtered.fastq.gz splitseq_R_SRR13948567_R1.fastq.gz.filtered.fastq.gz splitseq_R_SRR13948568_R1.fastq.gz.filtered.fastq.gz splitseq_R_SRR13948569_R1.fastq.gz.filtered.fastq.gz splitseq_R_SRR13948570_R1.fastq.gz.filtered.fastq.gz splitseq_R_SRR13948571_R1.fastq.gz.filtered.fastq.gz | gzip > splitseq_R_merged.fastq.gz zcat splitseq_T_SRR13948565_R1.fastq.gz.filtered.fastq.gz splitseq_T_SRR13948566_R1.fastq.gz.filtered.fastq.gz splitseq_T_SRR13948567_R1.fastq.gz.filtered.fastq.gz splitseq_T_SRR13948568_R1.fastq.gz.filtered.fastq.gz splitseq_T_SRR13948569_R1.fastq.gz.filtered.fastq.gz splitseq_T_SRR13948570_R1.fastq.gz.filtered.fastq.gz splitseq_T_SRR13948571_R1.fastq.gz.filtered.fastq.gz | gzip > splitseq_T_merged.fastq.gz mkdir -p splitseq_c2c12_R STARsoloManuscript/exe/STAR_2.7.9a \ --genomeDir "STARsoloManuscript/genomes/index/STAR_2.7.9a/mouse/fullSA" \ --runThreadN 16 \ --readFilesCommand zcat \ --outSAMtype BAM SortedByCoordinate \ --outFilterType BySJout \ --outFileNamePrefix "splitseq_c2c12_R/" \ --readFilesIn splitseq_R_merged.fastq.gz mkdir -p splitseq_c2c12_T STARsoloManuscript/exe/STAR_2.7.9a \ --genomeDir "STARsoloManuscript/genomes/index/STAR_2.7.9a/mouse/fullSA" \ --runThreadN 16 \ --readFilesCommand zcat \ --outSAMtype BAM SortedByCoordinate \ --outFilterType BySJout \ --outFileNamePrefix "splitseq_c2c12_T/" \ --readFilesIn splitseq_T_merged.fastq.gz
We can index the BAM files with samtools then view them in IGV.
Simply go to kallisto_paper_analysis and follow the instructions there.
Simply go to clustering_analysis to obtain the pipeline for generating comparisons between count matrices (PCA, UMAP, marker genes, alluvial plots, etc.)