Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Annotated VCF/CSVs for download #1171

Closed
davmlaw opened this issue Sep 12, 2024 · 4 comments
Closed

Generate Annotated VCF/CSVs for download #1171

davmlaw opened this issue Sep 12, 2024 · 4 comments
Assignees

Comments

@davmlaw
Copy link
Contributor

davmlaw commented Sep 12, 2024

Downloading a 1-2M record multi-sample VCF with annotations can take so long things time out

I used a few tricks in SACGF/variantgrid_com#86 but we should probably download it then stream a static file

Suggest:

  • On VCF / Sample page - if there is no CachedGeneratedFile for the url, then display button
  • Cicking button creates a CachedGeneratedFile + task
  • This creates CSV/VCF server side behind a UUID
  • Client side shows message about generation, and has spinner going
  • Once done, updates CachedGeneratedFile
  • Drop down a message that says “Creating VCF, download will automatically start if you ” -
  • Cached generated file check

Current behavior:

view_vcf - cohort_grid_export (csv/vcf)
View_sample - sample_grid_export

can probably keep those URLs/

@davmlaw davmlaw self-assigned this Sep 12, 2024
davmlaw added a commit that referenced this issue Sep 13, 2024
@davmlaw
Copy link
Contributor Author

davmlaw commented Sep 13, 2024

Working in branch feature/issue_1171_generate_file_for_download

Mostly done, just need to handle browser bits I think.

We could possibly raise an issue about having analyses handle this as well.

Maybe also have some kind of regular purge of old files?

davmlaw added a commit that referenced this issue Sep 17, 2024
davmlaw added a commit that referenced this issue Sep 19, 2024
davmlaw added a commit that referenced this issue Sep 19, 2024
@davmlaw
Copy link
Contributor Author

davmlaw commented Sep 19, 2024

Took a while as I had to do a few things:

  • Moved CachedGeneratedFile to use UUID so that people couldn't take eg other people's VCFs by brute forcing integer PKs
  • Needed to zip/gz files as they were very large
  • Added progress count as files took minutes to generate
  • Files auto download if you click on page, can come back to page and see progress etc - though won't auto download (will create static link when done, though)

Will probably have to do a few extra things to clean up CachedGeneratedFiles etc over time now as these are quite large still

davmlaw added a commit that referenced this issue Sep 19, 2024
TheMadBug pushed a commit that referenced this issue Sep 19, 2024
TheMadBug pushed a commit that referenced this issue Sep 19, 2024
TheMadBug pushed a commit that referenced this issue Sep 19, 2024
TheMadBug pushed a commit that referenced this issue Sep 19, 2024
@davmlaw
Copy link
Contributor Author

davmlaw commented Sep 20, 2024

Hmmm, this has taken 2 hours to export 16% - so 8% per hour = 12.5 hours... I think celery is going to time out here...

@davmlaw
Copy link
Contributor Author

davmlaw commented Sep 23, 2024

Spun polish scope creep into #1173 - will do later when less busy

This is VG only, and has been tested and deployed

@davmlaw davmlaw closed this as completed Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant