Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duckdb #1

Merged
merged 59 commits into from
Aug 8, 2024
Merged

Duckdb #1

merged 59 commits into from
Aug 8, 2024

Conversation

bsantan
Copy link
Collaborator

@bsantan bsantan commented Jul 31, 2024

  • DuckDB merge implementation
  • Autogeneration of merge.yaml based on input params provided
  • Features use of either KGX or DuckDB for merge

@hrshdhgd
Copy link
Contributor

hrshdhgd commented Aug 3, 2024

Alright, I have my first stab at the duck db code.

I haven't tried it on the big dataset yet so no idea about the efficiency. Some points to note:

  • I haven't deleted any of @bsantan 's code. Some are commented out.

  • I have added unit test files with some test input and expected output. We can tweak it to cover most cases to see if the code addresses all scenarios (most at least)

  • The new functions are:

    • load_into_duckdb
    • duckdb_nodes_merge
    • duckdb_edges_merge
      All the are in ducked_utils.py. In merge_kg.py I have replaced the duckdb_merge function and commented out the original code.
  • Added a test file test_duckdb_utils.py that tests the code in ducked_utils.py above. We need to add more tests to test every aspect of this project. It is good practice and avoids surprises.

  • There is one TODO:

# TODO: Get priority sources dynamically from the ontologies transform
    priority_sources = ["go.json", "chebi.json", "ncbitaxon_removed_subset.json"]

This can be worked on next week. There are still a lot of pieces left to fall in place.

@hrshdhgd hrshdhgd merged commit c899f0e into main Aug 8, 2024
3 checks passed
@hrshdhgd hrshdhgd deleted the duckdb branch August 8, 2024 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants