You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I'm a new user and have downloaded the large volume of vcf-type files required to act as the reference.
Describe the solution you'd like
Given the size of these it would be nice to have a file on the FTP server containing their checksums in order to quickly verify the downloads as part of a pipeline (snakemake in my case).
Benefits
Gets ahead of downstream issues relating to file integrity.
Potential Risks
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Thank you for using the pipeline and for sharing your valuable suggestions. Adding a checksum for all reference files is indeed a great idea, and we've added it to our to-do list.
May I ask if you are currently using Snakemake to run the pipeline? If so, I would recommend trying Nextflow. It’s relatively straightforward to use, and supports a wider range of executors for flexible deployment. We also have online documentation for the Nextflow harmonisation pipeline. Please be aware that we are no longer actively maintaining the Snakemake version of the pipeline.
Thanks for your reply. I am indeed using snakemake, but I'm using v1.1.10 of the harmoniser, which I believe is the latest version. It's possible to integrate foreign workflow managers (in this case nextflow) with snakemake with use of the handover attribute:
rule harmonise_gwas:
input:
config = "results/gwas/gwas_ssf/{download_name}.tsv-meta.yaml",
sumstats = "results/gwas/gwas_ssf/{download_name}.tsv"
output:
# Bad but there are so many files to list here!
"results/gwas/gwas_ssf/{download_name}/final/{download_name}.h.tsv.gz",
params:
launch_dir = pathlib.Path("results/gwas/gwas_ssf"),
sumstats = lambda w: pathlib.Path(f"results/gwas/gwas_ssf/{w.download_name}.tsv").resolve(),
ref = pathlib.Path("resources/ebispot_harmoniser/reference").resolve(),
nf_config = pathlib.Path("config/harmoniser.config").resolve(),
version = config['ebispot_harmoniser']['version'],
profiles = 'local,singularity'
threads: 12
resources:
runtime = 90
handover: True
shell:
"""
cd {params.launch_dir}
nextflow \
-c {params.nf_config} \
run EBISPOT/gwas-sumstats-harmoniser \
-r {params.version} \
--ref {params.ref} \
--harm \
--file {params.sumstats} \
-profile {params.profiles}
"""
Is your feature request related to a problem? Please describe.
I'm a new user and have downloaded the large volume of
vcf
-type files required to act as the reference.Describe the solution you'd like
Given the size of these it would be nice to have a file on the FTP server containing their checksums in order to quickly verify the downloads as part of a pipeline (
snakemake
in my case).Benefits
Gets ahead of downstream issues relating to file integrity.
Potential Risks
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: