Skip to content

Commit

Permalink
Fix config (#119)
Browse files Browse the repository at this point in the history
* remove split fastq from config and all rules

* clean up config and fix spelling of indices

* remove test config files

* update default heatmap options

* return to config but with small changes

* refactor config.py

* fix typo in config.py

* use indices consistently for genome indices

* update config process in docs
  • Loading branch information
CChahrour authored Jan 26, 2024
1 parent abee85a commit 1a5f082
Show file tree
Hide file tree
Showing 17 changed files with 157 additions and 753 deletions.
98 changes: 34 additions & 64 deletions docs/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,82 +9,45 @@ The pipeline is configured using a YAML file: e.g. `config_atac.yml`, `config_ch
The following command will generate the working directory and configuration file for the ATAC-seq pipeline:

```bash
seqnado-config atac
seqnado-config chip
```

You should get somthing like this:

```bash
$ seqnado-config chip
[1/23] user_name (Your name): asmith
[2/23] Select date
1 - 2024-01-13
Choose from [1] (1):
[3/23] project_name (Project name): TEST
[4/23] Select project_id
1 - test
Choose from [1] (1): 1
[5/23] genome (hg38):
[6/23] chromosome_sizes (/ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/sequence/hg38.chrom.sizes):
[7/23] indicies (/ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/bt2_index/hg38):
[8/23] gtf (/ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/genes/hg38.ncbiRefSeq.gtf):
[9/23] Select read_type
1 - paired
2 - single
Choose from [1/2] (1): 1
[10/23] Select split_fastq
1 - True
2 - False
Choose from [1/2] (1): 2
[11/23] split_fastq_parts (int):
[12/23] Select remove_pcr_duplicates_method
1 - picard
2 - deeptools
Choose from [1/2] (1): 1
[13/23] Select remove_blacklist
1 - yes
2 - no
Choose from [1/2] (1): 1
[14/23] blacklist (/ceph/project/milne_group/shared/seqnado_reference/hg38/hg38-blacklist.v2.bed.gz):
[15/23] Select make_bigwigs
1 - yes
2 - no
Choose from [1/2] (1): 1
[16/23] Select pileup_method
1 - deeptools
2 - homer
Choose from [1/2] (1): 1
[17/23] Select make_heatmaps
1 - yes
2 - no
Choose from [1/2] (1): 1
[18/23] Select call_peaks
1 - yes
2 - no
Choose from [1/2] (1): 1
[19/23] Select peak_calling_method
1 - macs
2 - lanceotron
3 - homer
Choose from [1/2/3] (1): 2
[20/23] Select make_ucsc_hub
1 - yes
2 - no
Choose from [1/2] (1): 1
[21/23] UCSC_hub_directory (path/to/ publically accessible location on the server): /project/milne_group/datashare/asmith/chipseq/TEST_HUB
[22/23] email (Email address (UCSC required)): [email protected]
[23/23] Select color_by
1 - samplename
2 - method
Choose from [1/2] (1): 1
What is your project name? [cchahrou_project]: TEST
What is your genome name? [other]: hg38
Path to Bowtie2 genome indices: [None]: /ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/bt2_index/hg38
Path to chromosome sizes file: [None]: /ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/sequence/hg38.chrom.sizes
Path to GTF file: [None]: /ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/genes/hg38.ncbiRefSeq.gtf
Path to blacklist bed file: [None]: /ceph/project/milne_group/shared/seqnado_reference/hg38/hg38-blacklist.v2.bed.gz
Do you want to remove blacklist regions? (yes/no) [yes]: yes
Remove PCR duplicates? (yes/no) [yes]: yes
Remove PCR duplicates method: [picard]: picard
Do you have spikein? (yes/no) [no]: yes
Normalisation method: [orlando/with_input]: orlando
Reference genome: [hg38]: hg38
Spikein genome: [dm6]: dm6
Path to fastqscreen config: [/ceph/project/milne_group/shared/seqnado_reference/fastqscreen_reference/fastq_screen.conf]: /ceph/project/milne_group/shared/seqnado_reference/fastqscreen_reference/fastq_screen.conf
Do you want to make bigwigs? (yes/no) [no]: yes
Pileup method: [deeptools/homer]: deeptools
Do you want to make heatmaps? (yes/no) [no]: yes
Do you want to call peaks? (yes/no) [no]: yes
Peak caller: [lanceotron/macs/homer]: lanceotron
Do you want to make a UCSC hub? (yes/no) [no]: yes
UCSC hub directory: [/path/to/ucsc_hub/]: /project/milne_group/datashare/etc
What is your email address? [[email protected]]: email for UCSC
Color by (for UCSC hub): [samplename]: samplename
Directory '2024-01-26_chip_TEST' has been created with the 'config_chip.yml' file.
```

This will generate the following files:

```bash
$ tree 2024-01-13_test/
$ tree 2024-01-13_chip_test/

2024-01-13_test/
2024-01-13_chip_test/
├── config_chip.yml
└── readme_test.md

Expand Down Expand Up @@ -230,6 +193,13 @@ $ ls -l

```bash
tmux new -s NAME_OF_SESSION

# or

screen -S NAME_OF_SESSION

# to exit screen session
ctrl+a d
```


Expand Down
112 changes: 33 additions & 79 deletions seqnado/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ def get_user_input(prompt, default=None, is_boolean=False, choices=None):
return user_input



def setup_configuration(assay, genome, template_data):
username = os.getenv('USER', 'unknown_user')
today = datetime.datetime.now().strftime('%Y-%m-%d')
Expand All @@ -40,86 +39,55 @@ def setup_configuration(assay, genome, template_data):

if genome == "other":
genome = get_user_input("What is your genome name?", default="other")
if assay in ["chip", "atac"]:
genome_dict = {
genome: {
"index": get_user_input("Path to Bowtie2 genome index:"),
"chromosome_sizes": get_user_input("Path to chromosome sizes file:"),
"gtf": get_user_input("Path to GTF file:"),
"blacklist": get_user_input("Path to blacklist bed file:")
}
genome_dict = {
genome: {
"indices": get_user_input("Path to Bowtie2 genome indices:") if assay in ["chip", "atac"] else get_user_input("Path to STAR v2.7.10b genome indices:"),
"chromosome_sizes": get_user_input("Path to chromosome sizes file:"),
"gtf": get_user_input("Path to GTF file:"),
"blacklist": get_user_input("Path to blacklist bed file:")
}
elif assay == "rna":
genome_dict = {
genome: {
"index": get_user_input("Path to STAR v2.7.10b genome index:"),
"chromosome_sizes": get_user_input("Path to chromosome sizes file:"),
"gtf": get_user_input("Path to GTF file:"),
"blacklist": get_user_input("Path to blacklist bed file:")
}
}
else:
if genome in genome_values:
genome_dict[genome] = {
"indices": genome_values[genome].get('bt2_indices' if assay in ["chip", "atac"] else 'star_indices', ''),
"chromosome_sizes": genome_values[genome].get('chromosome_sizes', ''),
"gtf": genome_values[genome].get('gtf', ''),
"blacklist": genome_values[genome].get('blacklist', '')
}

elif genome in genome_values:
if assay in ["chip", "atac"]:
genome_dict = {
genome: {
"index": genome_values[genome]['bt2_index'],
"chromosome_sizes": genome_values[genome]['chromosome_sizes'],
"gtf": genome_values[genome]['gtf'],
"blacklist": genome_values[genome]['blacklist']
}
}
elif assay == "rna":
genome_dict = {
genome: {
"index": genome_values[genome]['star_index'],
"chromosome_sizes": genome_values[genome]['chromosome_sizes'],
"gtf": genome_values[genome]['gtf'],
"blacklist": genome_values[genome]['blacklist']
}
}

genome_config = {
'genome': genome,
'indices': genome_dict[genome]['indices'],
'chromosome_sizes': genome_dict[genome]['chromosome_sizes'],
'gtf': genome_dict[genome]['gtf'],
}
template_data.update(genome_config)

template_data['genome'] = genome
template_data['indicies'] = genome_dict[genome]['index']
template_data['chromosome_sizes'] = genome_dict[genome]['chromosome_sizes']
template_data['gtf'] = genome_dict[genome]['gtf']
template_data['read_type'] = get_user_input("What is your read type?", default="paired", choices=["paired", "single"])

template_data['remove_blacklist'] = get_user_input("Do you want to remove blacklist regions? (yes/no)", default="yes", is_boolean=True)
if template_data['remove_blacklist']:
template_data['blacklist'] = genome_dict[genome]['blacklist']

if assay in ["chip", "atac"]:
template_data['remove_pcr_duplicates'] = get_user_input("Remove PCR duplicates? (yes/no)", default="yes", is_boolean=True)
elif assay == "rna":
template_data['remove_pcr_duplicates'] = get_user_input("Remove PCR duplicates? (yes/no)", default="no", is_boolean=True)

template_data['remove_pcr_duplicates'] = get_user_input("Remove PCR duplicates? (yes/no)", default= "yes" if assay in ["chip", "atac"] else "no", is_boolean=True)
if template_data['remove_pcr_duplicates']:
template_data['remove_pcr_duplicates_method'] = get_user_input("Remove PCR duplicates method:", default="picard", choices=["picard"])

else:
template_data['remove_pcr_duplicates_method'] = "False"

if assay == "atac":
template_data['shift_atac_reads'] = get_user_input("Shift ATAC-seq reads? (yes/no)", default="yes", is_boolean=True)
elif assay in ["chip", "rna"]:
template_data['shift_atac_reads'] = "False"
template_data['shift_atac_reads'] = get_user_input("Shift ATAC-seq reads? (yes/no)", default="yes", is_boolean=True) if assay == "atac" else "False"

if assay == "chip":
template_data['spikein'] = get_user_input("Do you have spikein? (yes/no)", default="no", is_boolean=True)
template_data['spikein'] = get_user_input("Do you have spikein? (yes/no)", default="no", is_boolean=True)
if template_data['spikein']:
template_data['normalisation_method'] = get_user_input("Normalisation method:", default="orlando", choices=["orlando", "with_input"])
template_data['reference_genome'] = get_user_input("Reference genome:", default="hg38")
template_data['spikein_genome'] = get_user_input("Spikein genome:", default="dm6")
template_data['fastq_screen_config'] = get_user_input("Path to fastqscreen config:", default="/ceph/project/milne_group/shared/seqnado_reference/fastqscreen_reference/fastq_screen.conf")
elif assay in ["atac", "rna"]:
template_data['normalisation_method'] = "False"

template_data['split_fastq'] = get_user_input("Do you want to split FASTQ files? (yes/no)", default="no", is_boolean=True)
if template_data['split_fastq']:
template_data.update['split_fastq_parts'] = get_user_input("How many parts do you want to split the FASTQ files into?", default="4")



template_data['make_bigwigs'] = get_user_input("Do you want to make bigwigs? (yes/no)", default="no", is_boolean=True)
if template_data['make_bigwigs']:
template_data['pileup_method'] = get_user_input("Pileup method:", default="deeptools", choices=["deeptools", "homer"])
Expand All @@ -129,29 +97,16 @@ def setup_configuration(assay, genome, template_data):
template_data['call_peaks'] = get_user_input("Do you want to call peaks? (yes/no)", default="no", is_boolean=True)
if template_data['call_peaks']:
template_data['peak_calling_method'] = get_user_input("Peak caller:", default="lanceotron", choices=["lanceotron", "macs", "homer"])

elif assay == "rna":
template_data['call_peaks'] = "False"

if assay == "rna":
template_data['run_deseq2'] = get_user_input("Run DESeq2? (yes/no)", default="no", is_boolean=True)
elif assay in ["chip", "atac"]:
template_data['run_deseq2'] = "False"
template_data['run_deseq2'] = get_user_input("Run DESeq2? (yes/no)", default="no", is_boolean=True) if assay == "rna" else "False"

template_data['make_ucsc_hub'] = get_user_input("Do you want to make a UCSC hub? (yes/no)", default="no", is_boolean=True)
if template_data['make_ucsc_hub']:
template_data['UCSC_hub_directory'] = get_user_input("UCSC hub directory:", default="/path/to/ucsc_hub/")
template_data['email'] = get_user_input("What is your email address?", default=f"{username}@example.com")
template_data['color_by'] = get_user_input("Color by (for UCSC hub):", default="samplename")
else :
template_data['UCSC_hub_directory'] = "."
template_data['email'] = f"{username}@example.com"
template_data['color_by'] = "samplename"

if assay in ["chip", "atac"]:
template_data['options'] = TOOL_OPTIONS
elif assay == "rna":
template_data['options'] = TOOL_OPTIONS_RNA

template_data['UCSC_hub_directory'] = get_user_input("UCSC hub directory:", default="/path/to/ucsc_hub/") if template_data['make_ucsc_hub'] else "."
template_data['email'] = get_user_input("What is your email address?", default=f"{username}@example.com") if template_data['make_ucsc_hub'] else f"{username}@example.com"
template_data['color_by'] = get_user_input("Color by (for UCSC hub):", default="samplename") if template_data['make_ucsc_hub'] else "samplename"

template_data['options'] = TOOL_OPTIONS_RNA if assay == "rna" else TOOL_OPTIONS


# Tool Specific Options
Expand Down Expand Up @@ -250,4 +205,3 @@ def create_config(assay, genome):
file.write(template_deseq2.render(template_data))

print(f"Directory '{dir_name}' has been created with the 'config_{assay}.yml' file.")

4 changes: 2 additions & 2 deletions seqnado/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,9 @@ def has_bowtie2_index(prefix: str) -> bool:
path_dir = path_prefix.parent
path_prefix_stem = path_prefix.stem

bowtie2_indicies = list(path_dir.glob(f"{path_prefix_stem}*.bt2"))
bowtie2_indices = list(path_dir.glob(f"{path_prefix_stem}*.bt2"))

if len(bowtie2_indicies) > 0:
if len(bowtie2_indices) > 0:
return True


Expand Down
7 changes: 1 addition & 6 deletions seqnado/workflow/config/config.yaml.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,10 @@ design: "design.csv"

genome:
name: "{{genome}}"
indicies: "{{indicies}}"
indices: "{{indices}}"
chromosome_sizes: "{{chromosome_sizes}}"
gtf: "{{gtf}}"

read_type: "{{read_type}}"

remove_blacklist: "{{remove_blacklist}}"
blacklist: "{{blacklist}}"

Expand All @@ -30,9 +28,6 @@ spikein_options:
spikein_genome: "{{spikein_genome}}"
fastq_screen_config: "{{fastq_screen_config}}"

split_fastq: "{{split_fastq}}"
split_fastq_parts: "{{split_fastq_parts}}"

make_bigwigs: "{{make_bigwigs}}"
pileup_method: "{{pileup_method}}"
make_heatmaps: "{{make_heatmaps}}"
Expand Down
32 changes: 16 additions & 16 deletions seqnado/workflow/config/preset_genomes.json
Original file line number Diff line number Diff line change
@@ -1,56 +1,56 @@
{
"dm6": {
"bt2_index": "/ceph/project/milne_group/shared/seqnado_reference/dm6/UCSC/bt2_index/dm6",
"star_index": "/ceph/project/milne_group/shared/seqnado_reference/dm6/UCSC/STAR_2.7.10b",
"bt2_indices": "/ceph/project/milne_group/shared/seqnado_reference/dm6/UCSC/bt2_index/dm6",
"star_indices": "/ceph/project/milne_group/shared/seqnado_reference/dm6/UCSC/STAR_2.7.10b",
"chromosome_sizes": "/ceph/project/milne_group/shared/seqnado_reference/dm6/UCSC/sequence/dm6.chrom.sizes",
"gtf": "/ceph/project/milne_group/shared/seqnado_reference/dm6/UCSC/genes/dm6.ncbiRefSeq.gtf",
"blacklist": "/ceph/project/milne_group/shared/seqnado_reference/dm6/dm6-blacklist.v2.bed.gz"
},
"hg19": {
"bt2_index": "/ceph/project/milne_group/shared/seqnado_reference/hg19/UCSC/bt2_index/hg19",
"star_index": "/ceph/project/milne_group/shared/seqnado_reference/hg19/UCSC/STAR_2.7.10b",
"bt2_indices": "/ceph/project/milne_group/shared/seqnado_reference/hg19/UCSC/bt2_index/hg19",
"star_indices": "/ceph/project/milne_group/shared/seqnado_reference/hg19/UCSC/STAR_2.7.10b",
"chromosome_sizes": "/ceph/project/milne_group/shared/seqnado_reference/hg19/UCSC/sequence/hg19.chrom.sizes",
"gtf": "/ceph/project/milne_group/shared/seqnado_reference/hg19/UCSC/genes/hg19.ncbiRefSeq.gtf",
"blacklist": "/ceph/project/milne_group/shared/seqnado_reference/hg19/hg19-blacklist.v2.bed.gz "
},
"hg38": {
"bt2_index": "/ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/bt2_index/hg38",
"star_index": "/ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/STAR_2.7.10b",
"bt2_indices": "/ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/bt2_index/hg38",
"star_indices": "/ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/STAR_2.7.10b",
"chromosome_sizes": "/ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/sequence/hg38.chrom.sizes",
"gtf": "/ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/genes/hg38.ncbiRefSeq.gtf",
"blacklist": "/ceph/project/milne_group/shared/seqnado_reference/hg38/hg38-blacklist.v2.bed.gz"
},
"hg38_dm6": {
"bt2_index": "/ceph/project/milne_group/shared/seqnado_reference/hg38_dm6/UCSC/bt2_index/hg38_dm6",
"star_index": "NA",
"bt2_indices": "/ceph/project/milne_group/shared/seqnado_reference/hg38_dm6/UCSC/bt2_index/hg38_dm6",
"star_indices": "NA",
"chromosome_sizes": "/ceph/project/milne_group/shared/seqnado_reference/hg38_dm6/UCSC/sequence/hg38_dm6.chrom.sizes",
"gtf": "/ceph/project/milne_group/shared/seqnado_reference/hg38_dm6/UCSC/genes/hg38_dm6.ncbiRefSeq.gtf",
"blacklist": "/ceph/project/milne_group/shared/seqnado_reference/hg38_dm6/hg38_dm6-blacklist.v2.bed.gz"
},
"hg38_mm39": {
"bt2_index": "/ceph/project/milne_group/shared/seqnado_reference/hg38_mm39/bt2_index/hg38_mm39",
"star_index": "NA",
"bt2_indices": "/ceph/project/milne_group/shared/seqnado_reference/hg38_mm39/bt2_index/hg38_mm39",
"star_indices": "NA",
"chromosome_sizes": "/ceph/project/milne_group/shared/seqnado_reference/hg38_mm39/sequence/hg38_mm39.fa.fai",
"gtf": "/ceph/project/milne_group/shared/seqnado_reference/hg38_mm39/genes/hg38_mm39.gtf",
"blacklist": "/ceph/project/milne_group/shared/seqnado_reference/hg38_mm39/hg38_mm39-blacklist.bed.gz"
},
"hg38_spikein": {
"bt2_index": "NA",
"star_index": "/ceph/project/milne_group/shared/seqnado_reference/hg38_spikein/UCSC/STAR_2.7.10b",
"bt2_indices": "NA",
"star_indices": "/ceph/project/milne_group/shared/seqnado_reference/hg38_spikein/UCSC/STAR_2.7.10b",
"chromosome_sizes": "/ceph/project/milne_group/shared/seqnado_reference/hg38_spikein/hg38_spikein.chrom.sizes",
"gtf": "/ceph/project/milne_group/shared/seqnado_reference/hg38_spikein/UCSC/genes/hg38_spikein_transcripts.gtf",
"blacklist": "/ceph/project/milne_group/shared/seqnado_reference/hg38/hg38-blacklist.v2.bed.gz"
},
"mm10": {
"bt2_index": "/ceph/project/milne_group/shared/seqnado_reference/mm10/UCSC/bt2_index/mm10",
"star_index": "/ceph/project/milne_group/shared/seqnado_reference/mm10/UCSC/STAR_2.7.10b",
"bt2_indices": "/ceph/project/milne_group/shared/seqnado_reference/mm10/UCSC/bt2_index/mm10",
"star_indices": "/ceph/project/milne_group/shared/seqnado_reference/mm10/UCSC/STAR_2.7.10b",
"chromosome_sizes": "/ceph/project/milne_group/shared/seqnado_reference/mm10/UCSC/sequence/mm10.chrom.sizes",
"gtf": "/ceph/project/milne_group/shared/seqnado_reference/mm10/UCSC/genes/mm10.ncbiRefSeq.gtf",
"blacklist": "/ceph/project/milne_group/shared/seqnado_reference/mm10/mm10-blacklist.v2.bed.gz"
},
"mm39": {
"bt2_index": "/ceph/project/milne_group/shared/seqnado_reference/mm39/UCSC/bt2_index/mm39",
"star_index": "/ceph/project/milne_group/shared/seqnado_reference/mm39/UCSC/STAR_2.7.10b",
"bt2_indices": "/ceph/project/milne_group/shared/seqnado_reference/mm39/UCSC/bt2_index/mm39",
"star_indices": "/ceph/project/milne_group/shared/seqnado_reference/mm39/UCSC/STAR_2.7.10b",
"chromosome_sizes": "/ceph/project/milne_group/shared/seqnado_reference/mm39/UCSC/sequence/mm39.chrom.sizes",
"gtf": "/ceph/project/milne_group/shared/seqnado_reference/mm39/UCSC/genes/mm39.ncbiRefSeq.gtf",
"blacklist": "/ceph/project/milne_group/shared/seqnado_reference/mm39/mm10-blacklist.v2.Liftover.mm39.bed.gz"
Expand Down
Loading

0 comments on commit 1a5f082

Please sign in to comment.