Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A flattened binary format for GFAs #150

Merged
merged 44 commits into from
Mar 11, 2024
Merged

A flattened binary format for GFAs #150

merged 44 commits into from
Mar 11, 2024

Conversation

sampsyo
Copy link
Collaborator

@sampsyo sampsyo commented Mar 11, 2024

This is something I've been meaning to sketch for a long time, and finally hacked together a prototype. It is a Rust implementation of a flattened format for representing GFA data.

This MVP contains an in-memory format and some initial evidence that an on-disk binary file format is achievable without too much more work (by heavily relying on the zerocopy crate and a bunch of flat representation trickery). The prototype includes a byte-exact round-tripper for text GFA files (exploiting the rs-gfa crate for parsing and our own hand-rolled pretty-printer). The next steps are to finish off the reading and writing of binary files, and then to try implementing basic algorithms on top of this representation.

I will have more to say/write about this elsewhere when I get time, but though building this proof of concept, I am now hopeful that there is a path forward here to implement an interesting, flexible zero-copy binary format. And I suspect it will be fast.

@sampsyo sampsyo merged commit 77d7077 into main Mar 11, 2024
3 checks passed
@sampsyo sampsyo deleted the polbin branch March 11, 2024 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant