-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ls #260
Fix ls #260
Conversation
fe8fc1a
to
a687365
Compare
Thanks for the PR. What exactly is the issue you are seeing? FWIW, for me, both of these statements work (although they do different things). EDIT: The above repro on a local lakeFS quickstart instance, slightly edited (requires a import duckdb
import fsspec
import lakefs
lakefs.Repository("quickstart").create("local://quickstart", include_samples=True, exist_ok=True)
# Working
res_a = fsspec.filesystem("lakefs").glob("lakefs://quickstart/main/*.parquet")
print(res_a)
# Not working
duckdb.register_filesystem(fsspec.filesystem("lakefs"))
res_b = duckdb.sql("FROM 'lakefs://quickstart/main/*.parquet'")
print(res_b) |
@@ -542,7 +542,7 @@ def ls( | |||
# Retry the API call with appended slash if the current result | |||
# is just a single directory entry only (not its contents). | |||
# This is useful to allow `ls("repo/branch/dir")` calls without a trailing slash. | |||
if len(info) == 1 and info[0]["type"] == "directory": | |||
if len(info) == 1 and info[0]["type"] == "directory" and info[0]["name"] == path + "/": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, this should always be true, since path
was requested in the original reference.objects()
call (L520 above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not always. Only when we request path without "/" at the end. If we have only one directory in the requested path, it would be false.
Like call for path lakefs://repo/main/
with this structure:
lakefs://
└── repo/
└── main/
└── test_dir/
├── a.parquet
└── b.parquet
@nicholasjng this is more complex example with sample repo: import duckdb
import fsspec
import lakefs
lakefs.Repository("quickstart").create("local://quickstart", include_samples=True, exist_ok=True)
fs = fsspec.filesystem("lakefs")
fs.mkdir("lakefs://quickstart/main/test")
fs.cp("lakefs://quickstart/main/lakes.parquet", "lakefs://quickstart/main/test/lakes.parquet")
fs.cp("lakefs://quickstart/main/lakes.parquet", "lakefs://quickstart/main/test/lakes2.parquet")
fs.rm("lakefs://quickstart/main/README.md")
fs.rm("lakefs://quickstart/main/data/", recursive=True)
fs.rm("lakefs://quickstart/main/images/", recursive=True)
fs.rm("lakefs://quickstart/main/lakes.parquet")
# Bug 1: return empty list but have "test" dir
try:
test1 = fsspec.filesystem("lakefs").ls("lakefs://quickstart/main/")
print(test1)
assert test1 != []
except:
print("test1 not working")
# Bug 2: return empty list but have "test" dir
# Not working
try:
test2 = fsspec.filesystem("lakefs").glob("lakefs://quickstart/main/test/*.parquet")
print(test2)
assert test2 != []
except:
print("test2 not working")
duckdb.register_filesystem(fsspec.filesystem("lakefs"))
# Not working
try:
test3 = duckdb.sql("FROM read_parquet('lakefs://quickstart/main/test/*.parquet')")
print(test3)
except:
print("test3 not working")
# Working
test4 = duckdb.sql("FROM read_parquet(['lakefs://quickstart/main/test/lakes2.parquet', 'lakefs://quickstart/main/test/lakes2.parquet'])")
print(test4) I create repo structure like this:
I think bag 2 related with bag 1. |
Sorry for the delay, and thanks for that new repro. Otherwise, looks good! |
The test failure happens because the Changing L488 in |
@nicholasjng I add test and find out why I have problem with line 430. It's because of incorrect cache update for directory with only directories in it. I fix it by removing "/" from the end of dirs names. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #260 +/- ##
=======================================
Coverage 94.24% 94.24%
=======================================
Files 5 5
Lines 382 382
Branches 70 70
=======================================
Hits 360 360
Misses 13 13
Partials 9 9 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, almost there!
- Fix bag: return empty list if contain one directory - Fix bag: return empty list if dir is not cached but have key in cache - Fix bag: not correct update cache id directory contains another directory
@AdrianoKF Done👌 |
Thank you! |
I found it when working with DuckDB. To reproduce the second bug, you can try calling this: