Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #120

Merged
merged 17 commits into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 34 additions & 64 deletions docs/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,82 +9,45 @@ The pipeline is configured using a YAML file: e.g. `config_atac.yml`, `config_ch
The following command will generate the working directory and configuration file for the ATAC-seq pipeline:

```bash
seqnado-config atac
seqnado-config chip
```

You should get somthing like this:

```bash
$ seqnado-config chip
[1/23] user_name (Your name): asmith
[2/23] Select date
1 - 2024-01-13
Choose from [1] (1):
[3/23] project_name (Project name): TEST
[4/23] Select project_id
1 - test
Choose from [1] (1): 1
[5/23] genome (hg38):
[6/23] chromosome_sizes (/ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/sequence/hg38.chrom.sizes):
[7/23] indicies (/ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/bt2_index/hg38):
[8/23] gtf (/ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/genes/hg38.ncbiRefSeq.gtf):
[9/23] Select read_type
1 - paired
2 - single
Choose from [1/2] (1): 1
[10/23] Select split_fastq
1 - True
2 - False
Choose from [1/2] (1): 2
[11/23] split_fastq_parts (int):
[12/23] Select remove_pcr_duplicates_method
1 - picard
2 - deeptools
Choose from [1/2] (1): 1
[13/23] Select remove_blacklist
1 - yes
2 - no
Choose from [1/2] (1): 1
[14/23] blacklist (/ceph/project/milne_group/shared/seqnado_reference/hg38/hg38-blacklist.v2.bed.gz):
[15/23] Select make_bigwigs
1 - yes
2 - no
Choose from [1/2] (1): 1
[16/23] Select pileup_method
1 - deeptools
2 - homer
Choose from [1/2] (1): 1
[17/23] Select make_heatmaps
1 - yes
2 - no
Choose from [1/2] (1): 1
[18/23] Select call_peaks
1 - yes
2 - no
Choose from [1/2] (1): 1
[19/23] Select peak_calling_method
1 - macs
2 - lanceotron
3 - homer
Choose from [1/2/3] (1): 2
[20/23] Select make_ucsc_hub
1 - yes
2 - no
Choose from [1/2] (1): 1
[21/23] UCSC_hub_directory (path/to/ publically accessible location on the server): /project/milne_group/datashare/asmith/chipseq/TEST_HUB
[22/23] email (Email address (UCSC required)): [email protected]
[23/23] Select color_by
1 - samplename
2 - method
Choose from [1/2] (1): 1
What is your project name? [cchahrou_project]: TEST
What is your genome name? [other]: hg38
Path to Bowtie2 genome indices: [None]: /ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/bt2_index/hg38
Path to chromosome sizes file: [None]: /ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/sequence/hg38.chrom.sizes
Path to GTF file: [None]: /ceph/project/milne_group/shared/seqnado_reference/hg38/UCSC/genes/hg38.ncbiRefSeq.gtf
Path to blacklist bed file: [None]: /ceph/project/milne_group/shared/seqnado_reference/hg38/hg38-blacklist.v2.bed.gz
Do you want to remove blacklist regions? (yes/no) [yes]: yes
Remove PCR duplicates? (yes/no) [yes]: yes
Remove PCR duplicates method: [picard]: picard
Do you have spikein? (yes/no) [no]: yes
Normalisation method: [orlando/with_input]: orlando
Reference genome: [hg38]: hg38
Spikein genome: [dm6]: dm6
Path to fastqscreen config: [/ceph/project/milne_group/shared/seqnado_reference/fastqscreen_reference/fastq_screen.conf]: /ceph/project/milne_group/shared/seqnado_reference/fastqscreen_reference/fastq_screen.conf
Do you want to make bigwigs? (yes/no) [no]: yes
Pileup method: [deeptools/homer]: deeptools
Do you want to make heatmaps? (yes/no) [no]: yes
Do you want to call peaks? (yes/no) [no]: yes
Peak caller: [lanceotron/macs/homer]: lanceotron
Do you want to make a UCSC hub? (yes/no) [no]: yes
UCSC hub directory: [/path/to/ucsc_hub/]: /project/milne_group/datashare/etc
What is your email address? [[email protected]]: email for UCSC
Color by (for UCSC hub): [samplename]: samplename
Directory '2024-01-26_chip_TEST' has been created with the 'config_chip.yml' file.
```

This will generate the following files:

```bash
$ tree 2024-01-13_test/
$ tree 2024-01-13_chip_test/

2024-01-13_test/
2024-01-13_chip_test/
├── config_chip.yml
└── readme_test.md

Expand Down Expand Up @@ -230,6 +193,13 @@ $ ls -l

```bash
tmux new -s NAME_OF_SESSION

# or

screen -S NAME_OF_SESSION

# to exit screen session
ctrl+a d
```


Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ dependencies = [
"seaborn",
"setuptools_scm",
"snakemake-wrapper-utils",
"snakemake<=9.0.0",
"snakemake<8.0.0",
"tracknado",
"wget",
]
Expand Down
4 changes: 3 additions & 1 deletion seqnado/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,9 @@ def cli_design(method, files, output="design.csv"):

design = DesignIP.from_fastq_files([FastqFileIP(path=fq) for fq in files])

design.to_dataframe().to_csv(output)
design.to_dataframe().reset_index().rename(columns={"index": "sample"}).to_csv(
output, index=False
)


@click.command(context_settings=dict(ignore_unknown_options=True))
Expand Down
108 changes: 33 additions & 75 deletions seqnado/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ def get_user_input(prompt, default=None, is_boolean=False, choices=None):
return user_input



def setup_configuration(assay, genome, template_data):
username = os.getenv('USER', 'unknown_user')
today = datetime.datetime.now().strftime('%Y-%m-%d')
Expand All @@ -40,86 +39,55 @@ def setup_configuration(assay, genome, template_data):

if genome == "other":
genome = get_user_input("What is your genome name?", default="other")
if assay in ["chip", "atac"]:
genome_dict = {
genome: {
"index": get_user_input("Path to Bowtie2 genome index:"),
"chromosome_sizes": get_user_input("Path to chromosome sizes file:"),
"gtf": get_user_input("Path to GTF file:"),
"blacklist": get_user_input("Path to blacklist bed file:")
}
genome_dict = {
genome: {
"indices": get_user_input("Path to Bowtie2 genome indices:") if assay in ["chip", "atac"] else get_user_input("Path to STAR v2.7.10b genome indices:"),
"chromosome_sizes": get_user_input("Path to chromosome sizes file:"),
"gtf": get_user_input("Path to GTF file:"),
"blacklist": get_user_input("Path to blacklist bed file:")
}
elif assay == "rna":
genome_dict = {
genome: {
"index": get_user_input("Path to STAR v2.7.10b genome index:"),
"chromosome_sizes": get_user_input("Path to chromosome sizes file:"),
"gtf": get_user_input("Path to GTF file:"),
"blacklist": get_user_input("Path to blacklist bed file:")
}
}
else:
if genome in genome_values:
genome_dict[genome] = {
"indices": genome_values[genome].get('bt2_indices' if assay in ["chip", "atac"] else 'star_indices', ''),
"chromosome_sizes": genome_values[genome].get('chromosome_sizes', ''),
"gtf": genome_values[genome].get('gtf', ''),
"blacklist": genome_values[genome].get('blacklist', '')
}

elif genome in genome_values:
if assay in ["chip", "atac"]:
genome_dict = {
genome: {
"index": genome_values[genome]['bt2_index'],
"chromosome_sizes": genome_values[genome]['chromosome_sizes'],
"gtf": genome_values[genome]['gtf'],
"blacklist": genome_values[genome]['blacklist']
}
}
elif assay == "rna":
genome_dict = {
genome: {
"index": genome_values[genome]['star_index'],
"chromosome_sizes": genome_values[genome]['chromosome_sizes'],
"gtf": genome_values[genome]['gtf'],
"blacklist": genome_values[genome]['blacklist']
}
}

genome_config = {
'genome': genome,
'indices': genome_dict[genome]['indices'],
'chromosome_sizes': genome_dict[genome]['chromosome_sizes'],
'gtf': genome_dict[genome]['gtf'],
}
template_data.update(genome_config)

template_data['genome'] = genome
template_data['indicies'] = genome_dict[genome]['index']
template_data['chromosome_sizes'] = genome_dict[genome]['chromosome_sizes']
template_data['gtf'] = genome_dict[genome]['gtf']
template_data['read_type'] = get_user_input("What is your read type?", default="paired", choices=["paired", "single"])

template_data['remove_blacklist'] = get_user_input("Do you want to remove blacklist regions? (yes/no)", default="yes", is_boolean=True)
if template_data['remove_blacklist']:
template_data['blacklist'] = genome_dict[genome]['blacklist']

if assay in ["chip", "atac"]:
template_data['remove_pcr_duplicates'] = get_user_input("Remove PCR duplicates? (yes/no)", default="yes", is_boolean=True)
elif assay == "rna":
template_data['remove_pcr_duplicates'] = get_user_input("Remove PCR duplicates? (yes/no)", default="no", is_boolean=True)

template_data['remove_pcr_duplicates'] = get_user_input("Remove PCR duplicates? (yes/no)", default= "yes" if assay in ["chip", "atac"] else "no", is_boolean=True)
if template_data['remove_pcr_duplicates']:
template_data['remove_pcr_duplicates_method'] = get_user_input("Remove PCR duplicates method:", default="picard", choices=["picard"])

else:
template_data['remove_pcr_duplicates_method'] = "False"

if assay == "atac":
template_data['shift_atac_reads'] = get_user_input("Shift ATAC-seq reads? (yes/no)", default="yes", is_boolean=True)
elif assay in ["chip", "rna"]:
template_data['shift_atac_reads'] = "False"
template_data['shift_atac_reads'] = get_user_input("Shift ATAC-seq reads? (yes/no)", default="yes", is_boolean=True) if assay == "atac" else "False"

if assay == "chip":
template_data['spikein'] = get_user_input("Do you have spikein? (yes/no)", default="no", is_boolean=True)
template_data['spikein'] = get_user_input("Do you have spikein? (yes/no)", default="no", is_boolean=True)
if template_data['spikein']:
template_data['normalisation_method'] = get_user_input("Normalisation method:", default="orlando", choices=["orlando", "with_input"])
template_data['reference_genome'] = get_user_input("Reference genome:", default="hg38")
template_data['spikein_genome'] = get_user_input("Spikein genome:", default="dm6")
template_data['fastq_screen_config'] = get_user_input("Path to fastqscreen config:", default="/ceph/project/milne_group/shared/seqnado_reference/fastqscreen_reference/fastq_screen.conf")
elif assay in ["atac", "rna"]:
template_data['normalisation_method'] = "False"

template_data['split_fastq'] = get_user_input("Do you want to split FASTQ files? (yes/no)", default="no", is_boolean=True)
if template_data['split_fastq']:
template_data.update['split_fastq_parts'] = get_user_input("How many parts do you want to split the FASTQ files into?", default="4")



template_data['make_bigwigs'] = get_user_input("Do you want to make bigwigs? (yes/no)", default="no", is_boolean=True)
if template_data['make_bigwigs']:
template_data['pileup_method'] = get_user_input("Pileup method:", default="deeptools", choices=["deeptools", "homer"])
Expand All @@ -129,25 +97,16 @@ def setup_configuration(assay, genome, template_data):
template_data['call_peaks'] = get_user_input("Do you want to call peaks? (yes/no)", default="no", is_boolean=True)
if template_data['call_peaks']:
template_data['peak_calling_method'] = get_user_input("Peak caller:", default="lanceotron", choices=["lanceotron", "macs", "homer"])

elif assay == "rna":
template_data['call_peaks'] = "False"

if assay == "rna":
template_data['run_deseq2'] = get_user_input("Run DESeq2? (yes/no)", default="no", is_boolean=True)
elif assay in ["chip", "atac"]:
template_data['run_deseq2'] = "False"
template_data['run_deseq2'] = get_user_input("Run DESeq2? (yes/no)", default="no", is_boolean=True) if assay == "rna" else "False"

template_data['make_ucsc_hub'] = get_user_input("Do you want to make a UCSC hub? (yes/no)", default="no", is_boolean=True)
if template_data['make_ucsc_hub']:
template_data['UCSC_hub_directory'] = get_user_input("UCSC hub directory:", default="/path/to/ucsc_hub/")
template_data['email'] = get_user_input("What is your email address?", default=f"{username}@example.com")
template_data['color_by'] = get_user_input("Color by (for UCSC hub):", default="samplename")

if assay in ["chip", "atac"]:
template_data['options'] = TOOL_OPTIONS
elif assay == "rna":
template_data['options'] = TOOL_OPTIONS_RNA

template_data['UCSC_hub_directory'] = get_user_input("UCSC hub directory:", default="/path/to/ucsc_hub/") if template_data['make_ucsc_hub'] else "."
template_data['email'] = get_user_input("What is your email address?", default=f"{username}@example.com") if template_data['make_ucsc_hub'] else f"{username}@example.com"
template_data['color_by'] = get_user_input("Color by (for UCSC hub):", default="samplename") if template_data['make_ucsc_hub'] else "samplename"

template_data['options'] = TOOL_OPTIONS_RNA if assay == "rna" else TOOL_OPTIONS


# Tool Specific Options
Expand Down Expand Up @@ -246,4 +205,3 @@ def create_config(assay, genome):
file.write(template_deseq2.render(template_data))

print(f"Directory '{dir_name}' has been created with the 'config_{assay}.yml' file.")

Loading
Loading