Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize rxclass code #333

Open
wants to merge 1 commit into
base: jrlegrand/rxclass-rework
Choose a base branch
from

Conversation

saywurdson
Copy link
Collaborator

"Resolves" #331

Explanation

Optimized current dag_tasks file by doing the following:

  1. changed the rxclass_df.append() from appending a new dataframe with each append to collecting all the created rows in a list and then creating the final dataframe from that list
  2. used the pandas deduplication method to remove duplicates that @jrlegrand mentioned in issue 331
  3. tried speeding up api requests by making the requests asynchronous while keeping the rate to 20 calls per second

Rationale

@jrlegrand was able to fix the code so that it runs, but mentioned that the code ran slowly. I was able to confirm this and the adjustments make reduced the time to create sagerx_lake.rxclass by more than half

Screenshot 2024-11-24 at 8 39 46 PM

Tests

  1. What testing did you do?
  • Did a quick QA by counting the number of rows in the table before and after the change and also counting the rows grouped by rela_source and verifying that the counts matched @jrlegrand 's counts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant