Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor file system source and path #496

Merged
merged 34 commits into from
Oct 21, 2024

Conversation

ilongin
Copy link
Contributor

@ilongin ilongin commented Oct 7, 2024

Before, when listing local FS, we had root of the FS always set for source field, e.g file:/// and the rest was in path. Idea behind this was to utilize partial indexing and remove obsolete add_storage method.
Now, when we are moving away from partial and bucket tables and moving indexing / listing to application level, we don't need this kind of setup as it was not convenient or intuitive.

With this changes when someone does DataStorage.from_storage("file:///home/ivan/animals/") we will have something like this in the listing table:

source path
file:///home/ivan/animals cats.jpg
file:///home/ivan/animals dogs/dog.jpg

This also adds re-indexing check in .from_storage() to avoid re-indexing if not needed.
Old code related to that root of the FS mentioned at beginning of description was removed.

Note that some tests were skipped as it was not possible to refactor them without diving deep into other important task that will be worked on soon as well: #318

@ilongin ilongin marked this pull request as draft October 7, 2024 23:13
@ilongin ilongin linked an issue Oct 7, 2024 that may be closed by this pull request
Copy link

cloudflare-workers-and-pages bot commented Oct 7, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0864e6e
Status: ✅  Deploy successful!
Preview URL: https://96fa38ff.datachain-documentation.pages.dev
Branch Preview URL: https://ilongin-447-refactor-fs-sour.datachain-documentation.pages.dev

View logs

Copy link

codecov bot commented Oct 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.36%. Comparing base (dfd7fb4) to head (0864e6e).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #496      +/-   ##
==========================================
- Coverage   87.49%   87.36%   -0.14%     
==========================================
  Files          97       97              
  Lines       10122    10136      +14     
  Branches     1382     1386       +4     
==========================================
- Hits         8856     8855       -1     
- Misses        909      923      +14     
- Partials      357      358       +1     
Flag Coverage Δ
datachain 87.30% <93.10%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ilongin ilongin marked this pull request as ready for review October 10, 2024 00:54
Copy link
Member

@rlamy rlamy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Removing so many special cases for cloud_type == "file" in the tests is nice!

@ilongin ilongin force-pushed the ilongin/447-refactor-fs-source-and-path branch from e03ba1e to 0864e6e Compare October 21, 2024 00:02
@ilongin ilongin merged commit a70f1be into main Oct 21, 2024
38 checks passed
@ilongin ilongin deleted the ilongin/447-refactor-fs-source-and-path branch October 21, 2024 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor back file system listing source and path
3 participants