Replies: 1 comment 4 replies
-
We are looking into this, but don't know how to do it:
Initially, we can provide read-only support, then users will ask for read-write support, that's the thing we decided to avoid in the beginning. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
First, congrats on the 1.0 GA release!
According to the FAQ section in the docs, accessing existing data in object store is not yet supported.
Do you have a plan to support this feature?
Would it be easier if the existing data is read-only?
Here is a compelling reason to do so: there are many AI/ML workloads that would like to access public (or private) datasets (often with average object size smaller than jfs blocksize) in read-only mode that's already in (cloud) some object stores. Having to copy them into juicefs is sub-optimal. One compelling reason to use jfs is the separate metadata store that makes metadata operations like listing all the files in a tree efficiently for multiple data parallel clients. Prefix listing a large object store tree takes several minutes per client. We worked around this problem by explicitly generating a shared static manifest file that could be fetched in a second or two. An ideal usage example:
The format command would scan the read-only import uris (check prefix conflicts) and create fs metadata for imported data without copying any data. Afterwards, If you
ls /mnt/jfs/<volume-name>
, you should see one directory (prefix from imported uris) per import uri instead of an empty directory.ls -R
should be fast because these directories are read-only and no rescan of original object stores is needed.Beta Was this translation helpful? Give feedback.
All reactions