Skip to content

MorganResearchLab/scarecrow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scarecrow

scarecrow

A toolkit for preprocessing single cell sequencing data.

Documentation

Todo

  • Run through ruff to check and format files
  • Error handling to capture missing or incorrect parameters, and unexpected file content
  • Peaks in between barcodes need further investigation
  • Plot generated by harvest currently will not handle > 1 barcode peak per whitelist (doesn't affect CSV output)
  • Benchmark different assays (SPLiTseq, Parse, 10X) and methods (split-pipe, scarecrow, UMI tools)
    • barcode recovery
    • alignment (STAR and kallisto)
  • Test alignment with kallisto and STAR
    • may need to alter sequence header formatting depending on what is retained in BAM file

Testing on laptop

R1=100K_1.fastq
R2=100K_2.fastq
BARCODES=(BC1:R1_v3:/Users/s14dw4/Documents/scarecrow_test/barcodes/bc_data_n123_R1_v3_5.barcodes
          BC2:v1:/Users/s14dw4/Documents/scarecrow_test/barcodes/bc_data_v1.barcodes
          BC3:R3_v3:/Users/s14dw4/Documents/scarecrow_test/barcodes/bc_data_R3_v3.barcodes)
for BARCODE in ${BARCODES[@]}
do
    scarecrow seed --fastqs ${R1} ${R2} --strands pos neg \
      -o ./results/barcodes_${BARCODE%:*:*}.csv --barcodes ${BARCODE}
done

FILES=(./results/barcodes_BC*csv)
scarecrow harvest ${FILES[@]} --barcode_count 3 --min_distance 11 \
    --conserved ./results/barcodes_BC1_conserved.tsv --out barcode_positions.csv

time scarecrow reap --fastqs ${R1} ${R2} -p ./barcode_positions.csv --barcode_reverse_order \
    -j 2 -m 2 -q 30 --barcodes ${BARCODES[@]} --extract 1:1-64 --umi 2:1-10 --out ./cDNA.fq --threads 4

scarecrow tally -f ./cDNA.fq -m 2

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published