PaintSHOP pipeline output specification.
Pipeline outputs are organized into three directories:
Folder | Description |
---|---|
01_reference_files/ | files that may be of use in other pipelines or analyses |
02_intermediate_files/ | large intermediate files, useful when debugging, but otherwise disposable |
03_output_files/ | all DNA/RNA probe sets as both .tsv files and zip archives |
All unwanted files can be safely deleted once the pipeline is run.
NOTE: To minimize disk usage, it may be desirable to keep only the three zip archives found in 03_output_files/04_zip_archives/, as the intermediate files are very large (e.g. the pipeline will generate over 200 GB of intermediate files for hg38).
Item | Location |
---|---|
DNA FISH probes in .tsv format | 03_output_files/01_dna_probes/ |
Isoform-resolved RNA FISH probes in .tsv format | 03_output_files/02_rna_probes_all/ |
Isoform-flattened RNA FISH probes in .tsv format | 03_output_files/03_rna_probes_iso/ |
All DNA/RNA FISH probe sets as compressed .zip archives | 03_output_files/04_zip_archives/ |
The columns in DNA probe files are:
# | Column | Description |
---|---|---|
0 | chromosome | the seqid that the probe targets |
1 | start | genomic start coordinate |
2 | stop | genomic stop coordinate |
3 | sequence | DNA sequence of the probe |
4 | Tm | Melting temperature of the probe DNA sequence |
5 | on-target score (0-100) | hybridization prediction at the target site of the probe |
6 | off-target score (0-10,000) | sum of the hybridization predictions for all off-target sites (up to 100) for the probe |
7 | repeat (0 or 1) | 0 if the sequence contains no repeat-masked bases, and 1 if it does |
10 | max k-mer | max count out of all k-mers in the probe sequence |
11 | strand | probe strand, + or - |
The columns in isoform-resolved RNA probe sets are:
# | Column | Description |
---|---|---|
0 | chromosome | the seqid that the probe targets |
1 | start | genomic start coordinate |
2 | stop | genomic stop coordinate |
3 | sequence | DNA sequence of the probe |
4 | Tm | Melting temperature of the probe DNA sequence |
5 | on-target score (0-100) | hybridization prediction at the target site of the probe |
6 | off-target score (0-10,000) | sum of the hybridization predictions for all off-target sites (up to 100) for the probe |
7 | repeat (0 or 1) | 0 if the sequence contains no repeat-masked bases, and 1 if it does |
8 | max k-mer | max count out of all k-mers in the probe sequence |
9 | strand | probe strand, + or - |
10 | refseq | The transcript ID that the probe targets, stripped of any version suffixes e.g. NM_001180043 |
11 | transcript_id | The unmodified transcript ID of the transcript that the probe targets e.g. NM_001180043.1 |
12 | gene_id | The gene ID of the gene whose transcript the probe targets e.g. PAU8 |
The columns in isoform-flattened RNA probe sets are:
# | Column | Description |
---|---|---|
0 | chromosome | the seqid that the probe targets |
1 | start | genomic start coordinate |
2 | stop | genomic stop coordinate |
3 | sequence | DNA sequence of the probe |
4 | Tm | Melting temperature of the probe DNA sequence |
5 | on-target score (0-100) | hybridization prediction at the target site of the probe |
6 | off-target score (0-10,000) | sum of the hybridization predictions for all off-target sites (up to 100) for the probe |
7 | repeat (0 or 1) | 0 if the sequence contains no repeat-masked bases, and 1 if it does |
8 | max k-mer | max count out of all k-mers in the probe sequence |
9 | strand | probe strand, + or - |
10 | gene_id | The gene ID of the gene whose transcript the probe targets e.g. PAU8 |
11 | transcripts | The number of isoforms that this probe targets |
An HTML report with diagnostics and detailed pipeline information can by generated with the following command:
$ snakemake --snakefile path/to/Snakefile --configfile path/to/config.yml --report pipeline_output/report.html
An example report is available here. For a visualization of the pipeline DAG structure, see: pipeline.pdf or pipeline.svg