Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat increase setup verbosity #180

Merged
merged 20 commits into from
Apr 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
c7c3121
Refactor name_control assignment in DesignIP class
alsmith151 Apr 8, 2024
0634033
Update prefix for peak files in PeakCallingFiles class
alsmith151 Apr 9, 2024
806ef31
Add FAQ entry for merging multiple samples into a single sample
alsmith151 Apr 9, 2024
bde04e9
updated cli:
alsmith151 Apr 9, 2024
20f22b3
Fix symlink creation for empty source paths
alsmith151 Apr 9, 2024
3f6a6d8
Fix symlink creation for empty source paths
alsmith151 Apr 9, 2024
7a7fe1c
Update design.py with metadata validation
alsmith151 Apr 9, 2024
b0f90c4
Update design.py with metadata validation
alsmith151 Apr 9, 2024
e51f0cc
fix: correct metadata validation
alsmith151 Apr 9, 2024
9a8b2dc
Merge branch 'master' into feat-enable-merging-pileups-and-consensus-…
alsmith151 Apr 9, 2024
0602154
Add merge column to design file for chip assay
alsmith151 Apr 9, 2024
72f2bab
Merge branch 'feat-enable-merging-pileups-and-consensus-peak-calls' i…
alsmith151 Apr 9, 2024
16fafe7
Update scale_method to "grouped" in Output class
alsmith151 Apr 9, 2024
99346d8
Refactor alignment_post_processing.smk to use get_sample_names instea…
alsmith151 Apr 9, 2024
d140fdc
Update peak_calling_method to "lanceotron" in NonRNAOutput class
alsmith151 Apr 9, 2024
339ddd9
Update scale_method to "merged" in Output class
alsmith151 Apr 9, 2024
b93f478
Added lanceotron params
alsmith151 Apr 9, 2024
7cb70d8
Add container definition for lanceotron_no_input_consensus rule
alsmith151 Apr 9, 2024
395bc7a
Refactor logging verbosity in cli_pipeline function
alsmith151 Apr 9, 2024
20fe4a2
Merge branch 'fix-missing-control-files' into feat-increase-setup-ver…
alsmith151 Apr 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,21 @@ remote has no library client (see https://apptainer.org/docs/user/latest/endpoin
Fix:

apptainer remote add --no-login SylabsCloud cloud.sylabs.io
apptainer remote use SylabsCloud
apptainer remote use SylabsCloud


## Optional configuration

### Can I merge multiple samples into a single sample?

Yes, you can merge multiple samples into a single sample to generate merged bigWig files and consensus peaks. To do this, you need to create a design file that specifies the samples to be merged. The design file should have a column named "merge" that specifies the samples to be merged e.g.:


| sample | r1 | r2 | deseq2 | merge |
|--------|----|----|--------|-------|
| rna1 | /tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna1_2.fastq.gz | /tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna1_1.fastq.gz | control | control |
| rna2 | /tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna2_2.fastq.gz | /tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna2_1.fastq.gz | control | control |
| rna3 | /tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna3_2.fastq.gz | /tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna3_1.fastq.gz | control | control |
| rna4 | /tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna4_2.fastq.gz | /tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna4_1.fastq.gz | treated | treated |
| rna5 | /tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna5_2.fastq.gz | /tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna5_1.fastq.gz | treated | treated |
| rna6 | /tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna6_2.fastq.gz | /tmp/pytest-of-asmith/pytest-7/data2/2024-01-13_rna_test/rna6_1.fastq.gz | treated | treated |
53 changes: 49 additions & 4 deletions seqnado/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,30 @@
"""
import pathlib
from seqnado.design import Design, DesignIP, FastqFile, FastqFileIP

if not files:
files = list(pathlib.Path(".").glob("*.fastq.gz"))
potential_file_locations = [

Check warning on line 58 in seqnado/cli.py

View check run for this annotation

Codecov / codecov/patch

seqnado/cli.py#L58

Added line #L58 was not covered by tests
".",
"fastqs",
"fastq",
"data",
"data/fastqs",
]

for location in potential_file_locations:
files = list(pathlib.Path(location).glob("*.fastq.gz"))
if files:
break

Check warning on line 69 in seqnado/cli.py

View check run for this annotation

Codecov / codecov/patch

seqnado/cli.py#L66-L69

Added lines #L66 - L69 were not covered by tests

if not files:
raise ValueError("No fastq files provided or found in current directory.")
logger.error("No fastq files provided or found in current directory")
logger.error(f"""

Check warning on line 73 in seqnado/cli.py

View check run for this annotation

Codecov / codecov/patch

seqnado/cli.py#L72-L73

Added lines #L72 - L73 were not covered by tests
Fastq files can be provided as arguments or found in the following directories:
{potential_file_locations}
""")
raise ValueError("No fastq files provided or found in current directory" )

Check warning on line 77 in seqnado/cli.py

View check run for this annotation

Codecov / codecov/patch

seqnado/cli.py#L77

Added line #L77 was not covered by tests



if not method == "chip":
design = Design.from_fastq_files([FastqFile(path=fq) for fq in files])
Expand Down Expand Up @@ -92,14 +110,26 @@
""",
type=click.Choice(choices=["lc", "ls", "ss"]),
)
@click.option(

Check warning on line 113 in seqnado/cli.py

View check run for this annotation

Codecov / codecov/patch

seqnado/cli.py#L113

Added line #L113 was not covered by tests
"--clean-symlinks",
is_flag=True,
help="Remove symlinks created by previous runs. Useful for re-running pipeline after misconfiguration.",
)
@click.option(

Check warning on line 118 in seqnado/cli.py

View check run for this annotation

Codecov / codecov/patch

seqnado/cli.py#L118

Added line #L118 was not covered by tests
"-v",
"--verbose",
is_flag=True,
help="Increase logging verbosity",
)
@click.argument("pipeline_options", nargs=-1, type=click.UNPROCESSED)
def cli_pipeline(
method,
pipeline_options,
help=False,
preset="local",
version=False,
apptainer_args="",
verbose=False,
clean_symlinks=False,
):
"""Runs the data processing pipeline"""

Expand All @@ -113,9 +143,24 @@
_version = version("seqnado")
print(f"SeqNado version {_version}")
sys.exit(0)

if verbose:
logger.remove()
logger.add(sys.stderr, level="DEBUG")

Check warning on line 149 in seqnado/cli.py

View check run for this annotation

Codecov / codecov/patch

seqnado/cli.py#L147-L149

Added lines #L147 - L149 were not covered by tests
else:
logger.remove()
logger.add(sys.stderr, level="INFO")

Check warning on line 152 in seqnado/cli.py

View check run for this annotation

Codecov / codecov/patch

seqnado/cli.py#L151-L152

Added lines #L151 - L152 were not covered by tests

pipeline_options, cores = extract_cores_from_options(pipeline_options)

# Removes old symlinks if requested
if clean_symlinks:
logger.info("Cleaning symlinks")
links = pathlib.Path("seqnado_output/fastqs").glob("*")
for link in links:
if link.is_symlink():
link.unlink()

Check warning on line 162 in seqnado/cli.py

View check run for this annotation

Codecov / codecov/patch

seqnado/cli.py#L157-L162

Added lines #L157 - L162 were not covered by tests

cmd = [
"snakemake",
"-c",
Expand Down
59 changes: 43 additions & 16 deletions seqnado/design.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
import os

Check warning on line 1 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L1

Added line #L1 was not covered by tests
import pathlib
import re
from typing import Any, Dict, List, Optional, Union, Literal, LiteralString
import sys
from typing import Any, Dict, List, Literal, LiteralString, Optional, Union

Check warning on line 5 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L5

Added line #L5 was not covered by tests

import numpy as np

Check warning on line 7 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L7

Added line #L7 was not covered by tests
import pandas as pd
from loguru import logger
from pydantic import BaseModel, Field, computed_field
from pydantic import BaseModel, Field, computed_field, field_validator

Check warning on line 10 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L10

Added line #L10 was not covered by tests
from snakemake.io import expand


logger.add(sink=sys.stderr, level="WARNING")


def is_path(path: Optional[Union[str, pathlib.Path]]) -> Optional[pathlib.Path]:
if isinstance(path, str):
p = pathlib.Path(path)
Expand All @@ -32,7 +30,7 @@
def model_post_init(self, *args):
self.path = pathlib.Path(self.path).resolve()

if not self.path.exists():
if not self.path.exists() or str(self.path) in ["-", ".", "", None]:

Check warning on line 33 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L33

Added line #L33 was not covered by tests
raise FileNotFoundError(f"{self.path} does not exist.")

@computed_field
Expand Down Expand Up @@ -257,6 +255,19 @@
)


class Metadata(BaseModel):
deseq2: Optional[str] = None
merge: Optional[str] = None
scale_group: Union[str, int] = "all"

Check warning on line 261 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L258-L261

Added lines #L258 - L261 were not covered by tests

@field_validator("deseq2", "merge")
@classmethod
def prevent_none(cls, v):
none_vals = [None, "None", "none", "null", "Null", "NULL", ".", "", "NA", np.nan]
if any([v == n for n in none_vals]):
assert v is not None, "None is not allowed when setting metadata"
return v

Check warning on line 269 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L263-L269

Added lines #L263 - L269 were not covered by tests

class Design(BaseModel):
assays: Dict[str, AssayNonIP] = Field(
default_factory=dict,
Expand Down Expand Up @@ -317,14 +328,19 @@
for assay_name, row in df.iterrows():
if simplified:
metadata = {}

for k, v in row.items():
if k not in ["r1", "r2"]:
metadata[k] = v

# Validate the metadata
metadata = Metadata(**metadata)

Check warning on line 337 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L337

Added line #L337 was not covered by tests

assays[assay_name] = AssayNonIP(
name=assay_name,
r1=FastqFile(path=row["r1"]),
r2=FastqFile(path=row["r2"]) if row["r2"] else None,
metadata=metadata,
metadata=metadata.model_dump(exclude_none=True),
)
else:
raise NotImplementedError("Not implemented")
Expand Down Expand Up @@ -424,7 +440,13 @@
for experiment in self.assays.values():

name_ip = experiment.name
name_control = f"{experiment.control_files.r1.sample_base_without_ip}_{experiment.control_files.r1.ip}"

try:
control_base = experiment.control_files.r1.sample_base_without_ip
control_ip = experiment.control_files.r1.ip
name_control = f"{control_base}_{control_ip}"
except AttributeError:
name_control = None

Check warning on line 449 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L444-L449

Added lines #L444 - L449 were not covered by tests

if name_to_query == name_ip or name_to_query == name_control:
if control is not None:
Expand Down Expand Up @@ -559,6 +581,9 @@
"control",
]:
metadata[k] = v

# Validate the metadata
metadata = Metadata(**metadata)

Check warning on line 586 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L586

Added line #L586 was not covered by tests

# Add the experiment
ip = row["ip"]
Expand All @@ -577,7 +602,7 @@
),
ip=ip,
control=None,
metadata=metadata,
metadata=metadata.model_dump(exclude_none=True),
)
else:
experiments[experiment_name] = ExperimentIP(
Expand All @@ -601,7 +626,7 @@
),
ip=ip,
control=control,
metadata=metadata,
metadata=metadata.model_dump(exclude_none=True),
)
else:
raise NotImplementedError("Not implemented")
Expand Down Expand Up @@ -743,7 +768,7 @@
Literal["deeptools", "homer"], List[Literal["deeptools", "homer"]]
] = None
make_bigwigs: bool = False
scale_method: Optional[Literal["cpm", "rpkm", "spikein", "csaw", "grouped"]] = None
scale_method: Optional[Literal["cpm", "rpkm", "spikein", "csaw", "merged"]] = None

Check warning on line 771 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L771

Added line #L771 was not covered by tests
prefix: Optional[str] = "seqnado_output/bigwigs/"

def model_post_init(self, __context: Any) -> None:
Expand Down Expand Up @@ -798,11 +823,12 @@
List[Literal["macs", "homer", "lanceotron", "seacr"]],
] = None
call_peaks: bool = False
prefix: Optional[str] = "seqnado_output/peaks/"

Check warning on line 826 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L826

Added line #L826 was not covered by tests

@property
def peak_files(self) -> List[str]:
return expand(
"seqnado_output/peaks/{method}/{sample}.bed",
self.prefix + "{method}/{sample}.bed",
sample=self.names,
method=self.peak_calling_method,
)
Expand Down Expand Up @@ -883,7 +909,7 @@
sample_names: List[str]

make_bigwigs: bool = False
pileup_method: Optional[

Check warning on line 912 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L912

Added line #L912 was not covered by tests
Union[Literal["deeptools", "homer"], List[Literal["deeptools", "homer"]]]
] = None
scale_method: Optional[Literal["cpm", "rpkm", "spikein", "csaw"]] = None
Expand Down Expand Up @@ -922,8 +948,8 @@
assay=self.assay,
names=self.design_dataframe["merge"].unique().tolist(),
make_bigwigs=self.make_bigwigs,
pileup_method=self.pileup_method,
scale_method="rpkm",
pileup_method="deeptools",
scale_method="merged",
)

files = bwf_samples.files + bwf_merged.files
Expand Down Expand Up @@ -1009,7 +1035,7 @@
class NonRNAOutput(Output):
assay: Union[Literal["ChIP"], Literal["ATAC"]]
call_peaks: bool = False
peak_calling_method: Optional[Union[

Check warning on line 1038 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L1038

Added line #L1038 was not covered by tests
Literal["macs", "homer", "lanceotron", False],
List[Literal["macs", "homer", "lanceotron"]],
]] = None
Expand All @@ -1024,7 +1050,8 @@
assay=self.assay,
names=self.design_dataframe["merge"].unique().tolist(),
call_peaks=self.call_peaks,
peak_calling_method=self.peak_calling_method,
peak_calling_method="lanceotron",
prefix="seqnado_output/peaks/merged/",
)

@computed_field
Expand Down Expand Up @@ -1080,7 +1107,7 @@
ip_names: List[str]
control_names: List[str]
call_peaks: bool = False
peak_calling_method: Optional[Union[

Check warning on line 1110 in seqnado/design.py

View check run for this annotation

Codecov / codecov/patch

seqnado/design.py#L1110

Added line #L1110 was not covered by tests
Literal["macs", "homer", "lanceotron", "seacr", False],
List[Literal["macs", "homer", "lanceotron", "seacr"]],
]] = None
Expand Down
14 changes: 10 additions & 4 deletions seqnado/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
import numpy as np
import shlex

from loguru import logger

Check warning on line 6 in seqnado/helpers.py

View check run for this annotation

Codecov / codecov/patch

seqnado/helpers.py#L6

Added line #L6 was not covered by tests


from seqnado.design import Design, DesignIP


Expand All @@ -19,7 +22,6 @@
"""
Extract the number of cores from the snakemake options.
"""
from loguru import logger

try:
cores_flag = options.index("-c")
Expand Down Expand Up @@ -62,12 +64,16 @@
"""
Create a symlink in the output directory with the new file name.
"""

new_path = output_dir / new_file_name
if not new_path.exists() and source_path.is_file():
try:
logger.debug(f"Symlinking {source_path} to {output_dir / new_file_name}")
if str(source_path) in [".", "..", "", None, "None"]:
logger.warning(f"Source path is empty for {new_file_name}. Will not symlink.")

Check warning on line 72 in seqnado/helpers.py

View check run for this annotation

Codecov / codecov/patch

seqnado/helpers.py#L70-L72

Added lines #L70 - L72 were not covered by tests

else:
new_path.symlink_to(source_path.resolve())
except FileExistsError:
print(f"Symlink for {new_path} already exists.")
logger.debug(f"Symlinked {source_path} to {output_dir / new_file_name} successfully.")

Check warning on line 76 in seqnado/helpers.py

View check run for this annotation

Codecov / codecov/patch

seqnado/helpers.py#L76

Added line #L76 was not covered by tests


def symlink_fastq_files(
Expand Down
13 changes: 9 additions & 4 deletions seqnado/workflow/rules/alignment_post_processing.smk
Original file line number Diff line number Diff line change
Expand Up @@ -203,14 +203,19 @@ rule move_bam_to_final_location:
def get_bam_files_for_merge(wildcards):
from seqnado.design import NormGroups
norm_groups = NormGroups.from_design(DESIGN, subset_column="merge")
return norm_groups.get_sample_group(wildcards.group)

sample_names = norm_groups.get_grouped_samples(wildcards.group)

return [
f"seqnado_output/aligned/{sample}.bam" for sample in sample_names
]


rule merge_bams:
input:
bams=get_bam_files_for_merge,
output:
temp("seqnado_output/aligned/grouped/{group}.bam"),
temp("seqnado_output/aligned/merged/{group}.bam"),
threads: 8
log:
"seqnado_output/logs/merge_bam/{group}.log",
Expand All @@ -222,9 +227,9 @@ rule merge_bams:

use rule index_bam as index_consensus_bam with:
input:
bam="seqnado_output/aligned/grouped/{group}.bam",
bam="seqnado_output/aligned/merged/{group}.bam",
output:
bai="seqnado_output/aligned/grouped/{group}.bam.bai",
bai="seqnado_output/aligned/merged/{group}.bam.bai",
threads: 8


Expand Down
10 changes: 8 additions & 2 deletions seqnado/workflow/rules/peak_call_grouped.smk
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
from seqnado.helpers import check_options

rule lanceotron_no_input_consensus:
input:
bigwig="seqnado_output/bigwigs/deeptools/grouped/{group}.bigWig",
bigwig="seqnado_output/bigwigs/deeptools/merged/{group}.bigWig",
output:
peaks="seqnado_output/peaks/lanceotron/grouped/{group}.bed",
peaks="seqnado_output/peaks/merged/lanceotron/{group}.bed",
threads: 8
params:
outdir="seqnado_output/peaks/merged/lanceotron",
options=check_options(config["lanceotron"]["callpeak"])
container:
"library://asmith151/seqnado/seqnado_extra:latest"
log:
"seqnado_output/logs/lanceotron/{group}.log",
shell:
Expand Down
8 changes: 4 additions & 4 deletions seqnado/workflow/rules/pileup_grouped.smk
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@

use rule deeptools_make_bigwigs as deeptools_make_bigwigs_consensus with:
input:
bam="seqnado_output/aligned/grouped/{group}.bam",
bai="seqnado_output/aligned/grouped/{group}.bam.bai",
bam="seqnado_output/aligned/merged/{sample}.bam",
bai="seqnado_output/aligned/merged/{sample}.bam.bai",
output:
bigwig="seqnado_output/bigwigs/deeptools/grouped/{group}.bigWig",
bigwig="seqnado_output/bigwigs/deeptools/merged/{sample}.bigWig",
threads: 8
log:
"seqnado_output/logs/bigwigs/{group}.log",
"seqnado_output/logs/bigwigs/{sample}.log",
9 changes: 8 additions & 1 deletion tests/test_pipelines.py
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,14 @@ def design(seqnado_run_dir, assay_type, assay):
completed = subprocess.run(" ".join(cmd), shell=True, cwd=seqnado_run_dir)
assert completed.returncode == 0

if assay == "rna-rx":
if assay == "chip":
# Add merge column to design file
import pandas as pd
df = pd.read_csv(seqnado_run_dir / "design.csv", index_col=0)
df["merge"] = df.index.str.split("-").str[-1]
df.to_csv(seqnado_run_dir / "design.csv")

elif assay == "rna-rx":
# Add deseq2 column to design file
import pandas as pd

Expand Down
Loading