Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add progress bars for ETL #319

Closed
wants to merge 13 commits into from

Conversation

jsstevenson
Copy link
Member

@jsstevenson jsstevenson commented Jan 3, 2024

I don't think we have tqdm as an issue here, but this does it.

In doing so, refactors GFF processing to use gffpandas instead of gffutils (in order to be able to get a total number for tqdm). As a happy side effect, this significantly cuts down Ensembl and (I think) NCBI loading time (using Pandas, unfortunately -- with a little effort we could probably spin up something better, maybe next Tech the Halls). It did entail a fair amount of internal refactoring for those sources.

@jsstevenson jsstevenson marked this pull request as ready for review January 4, 2024 19:51
@jsstevenson jsstevenson added the priority:low Low priority label Jan 29, 2024
@jsstevenson jsstevenson marked this pull request as draft February 6, 2024 01:40
@jsstevenson
Copy link
Member Author

I'm converting this to draft. My brain on this issue has gone like this

  1. We should drop GFF Utils for gffpandas
  2. We should just use a Polars-based GFF reader instead
  3. Wait, GFFs are just tab-separated files, we probably don't need a special reader if all we're doing is looping through them from the top

We should handle the GFF reader issue and return to this afterwards

Copy link

This PR is stale because it has been open 7 day(s) with no activity. Please review this PR.

@korikuzma
Copy link
Member

@jsstevenson do you intend to come back to this at some point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:low Low priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants