Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Checksums for the reference files #114

Open
twillis209 opened this issue Nov 8, 2024 · 2 comments
Open

[FEATURE]: Checksums for the reference files #114

twillis209 opened this issue Nov 8, 2024 · 2 comments
Labels
enhancement New feature or request user_request features that user suggested

Comments

@twillis209
Copy link

twillis209 commented Nov 8, 2024

Is your feature request related to a problem? Please describe.

I'm a new user and have downloaded the large volume of vcf-type files required to act as the reference.

Describe the solution you'd like

Given the size of these it would be nice to have a file on the FTP server containing their checksums in order to quickly verify the downloads as part of a pipeline (snakemake in my case).

Benefits

Gets ahead of downstream issues relating to file integrity.

Potential Risks

No response

Additional context

No response

@twillis209 twillis209 added enhancement New feature or request user_request features that user suggested labels Nov 8, 2024
@jiyue1214
Copy link
Collaborator

Hi twillis209,

Thank you for using the pipeline and for sharing your valuable suggestions. Adding a checksum for all reference files is indeed a great idea, and we've added it to our to-do list.

May I ask if you are currently using Snakemake to run the pipeline? If so, I would recommend trying Nextflow. It’s relatively straightforward to use, and supports a wider range of executors for flexible deployment. We also have online documentation for the Nextflow harmonisation pipeline. Please be aware that we are no longer actively maintaining the Snakemake version of the pipeline.

Thank you again for your feedback!

Best regards,
Yue

@twillis209
Copy link
Author

twillis209 commented Nov 12, 2024

Hi Yue

Thanks for your reply. I am indeed using snakemake, but I'm using v1.1.10 of the harmoniser, which I believe is the latest version. It's possible to integrate foreign workflow managers (in this case nextflow) with snakemake with use of the handover attribute:

rule harmonise_gwas:
    input:
        config = "results/gwas/gwas_ssf/{download_name}.tsv-meta.yaml",
        sumstats = "results/gwas/gwas_ssf/{download_name}.tsv"
    output:
        # Bad but there are so many files to list here!
        "results/gwas/gwas_ssf/{download_name}/final/{download_name}.h.tsv.gz",
    params:
        launch_dir = pathlib.Path("results/gwas/gwas_ssf"),
        sumstats = lambda w: pathlib.Path(f"results/gwas/gwas_ssf/{w.download_name}.tsv").resolve(),
        ref = pathlib.Path("resources/ebispot_harmoniser/reference").resolve(),
        nf_config = pathlib.Path("config/harmoniser.config").resolve(),
        version = config['ebispot_harmoniser']['version'],
        profiles = 'local,singularity'
    threads: 12
    resources:
        runtime = 90
    handover: True
    shell:
        """
        cd {params.launch_dir}

        nextflow \
        -c {params.nf_config} \
        run EBISPOT/gwas-sumstats-harmoniser \
        -r {params.version} \
        --ref {params.ref} \
        --harm \
        --file {params.sumstats} \
        -profile {params.profiles}
        """

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request user_request features that user suggested
Projects
None yet
Development

No branches or pull requests

2 participants