Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement IGD #28

Open
nleroy917 opened this issue Jul 31, 2024 · 3 comments
Open

Implement IGD #28

nleroy917 opened this issue Jul 31, 2024 · 3 comments
Assignees
Labels
new tool Request to implement a new tool

Comments

@nleroy917
Copy link
Member

We need to re-implement IGD in this crate. Being done by @donaldcampbelljr in #9

Original code here: https://github.com/databio/IGD

@nleroy917 nleroy917 added the new tool Request to implement a new tool label Jul 31, 2024
@nleroy917
Copy link
Member Author

For python bindings, we could do an OOP approach:

from gtars.igd import Igd

igd = Igd.create_from_files(
    source_files="path/to/files",
    output_folder="path/to/output",
    database_name="mydb"
)

# way later
igd = Igd.load_db("path/to/database)
idg.search(...)

@donaldcampbelljr donaldcampbelljr self-assigned this Jul 31, 2024
@donaldcampbelljr
Copy link
Member

IGD create and search now work in PR #9 with some caveats.

An IGD database can be created from a folder full of bedfiles. A search can be performed using a single bed file as the query.

Performance-wise, creation appears to be similar for C and Rust versions (80 files, ~280,000 regions) at 2.1 seconds.

There are some discrepancies between the C version that should be investigated in the future such as:

  • during searching, the number of counts may be off for one or more of the bedfiles. Most are the same, however. This may be due to pulling the .igd data back into memory for query. The gData tiles are not 1 to 1 (see attached picture). The creation step appears to be exactly the same based on numbers (e.g. # of Ctgs, Regions, Tiles, etc).
  • Certain bedfiles may cause the C version to crash but will pass using the Rust version and vice versa. Anecdotally, the Rust version appears to be more robust.
  • Test performance for larger sets of bedfiles.

image

@donaldcampbelljr donaldcampbelljr mentioned this issue Aug 21, 2024
@donaldcampbelljr
Copy link
Member

I just merged the PR that has been in progress since beginning of the year. However, IGD still needs some work.

  • feature parity with C version
  • understand any discrepancies between this version and C version
  • python bindings were not yet implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new tool Request to implement a new tool
Projects
None yet
Development

No branches or pull requests

2 participants