Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace deprecated listing / indexing with new DataChain.from_storage() #318

Open
1 task
Tracked by #39
ilongin opened this issue Aug 19, 2024 · 2 comments · May be fixed by #517
Open
1 task
Tracked by #39

Replace deprecated listing / indexing with new DataChain.from_storage() #318

ilongin opened this issue Aug 19, 2024 · 2 comments · May be fixed by #517
Assignees

Comments

@ilongin
Copy link
Contributor

ilongin commented Aug 19, 2024

In related PR #294 we are going to have listing logic inside of DataChain.from_storage() itself.

We should replace old listing that is being called from CLI and maybe other places with DataChain.from_storage().
There is a lot of tests around listing / indexing and we should refactor them as well if needed

As a follow up for this we should remove old legacy listing codebase (maybe it's better to do this in actual separate issue / PR). We should also remove buckets and partials

Note that we also need to replace Catalog.ls_storages to use new listing datasets as bucket table will be removed, as well as partials

  • Refactor datachain.Dataset.is_bucket_listing() to not use old listing check and maybe remove this method altogether as low level Dataset class should not know about LISTING_PREFIX and similar higher level abstractions
@ilongin
Copy link
Contributor Author

ilongin commented Oct 4, 2024

I think this should be prioritized as it's sometimes hard to refactor / change codebase as this old indexing part needs to be adopted. It would been much easier if it's just refactored to "new" indexing, not to mention it can only be used for CLI operations .. .e.g when we call Catalog.index(...) listing that's been created cannot be used in DataChain methods

@shcheklein
Copy link
Member

Agreed. Moved it to the ready column - let's keep simplifying things there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants