Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an AnnData tokenizer #14

Open
nleroy917 opened this issue Apr 19, 2024 · 0 comments
Open

Implement an AnnData tokenizer #14

nleroy917 opened this issue Apr 19, 2024 · 0 comments
Labels
new tool Request to implement a new tool tokenizers Region tokenization

Comments

@nleroy917
Copy link
Member

While the TreeTokenizer wrapper in geniml (ITTokenizer) is nice because it is abstract and can tokenize BED files and AnnData objects, I think that it makes more sense to just create a separate AnnData tokenizer. That way, we might not need a a wrapper in geniml and can just use the tokenizers directly when there is separation of concern.

It can still use an interval tree internally, but it will explicitly look for AnnData objects instead of bed files.

@nleroy917 nleroy917 added the tokenizers Region tokenization label Apr 19, 2024
@nleroy917 nleroy917 added the new tool Request to implement a new tool label Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new tool Request to implement a new tool tokenizers Region tokenization
Projects
None yet
Development

No branches or pull requests

1 participant