FoundationDB is a distributed key-store with strong ACID guarantees (Atomicity, Consistency, Isolation, Distributed). The core vision is to build a filesystem layer on top of FoundationDB that can leverage these internal consistencies to create a reliable (yet slow) distributed file store.
Using Linux’s Filesystem in Userspace (FUSE) module, our software will enable mounting the filesystem and interacting with it like a normal Unix directory, with our software behind the scenes translating reads and writes to FoundationDB transactions.
Each individual key retrieval or update has the potential to be slow. In order for advanced filesystem operations such as renaming and hard links to work, we will need layers of indirection that will require each operation to take multiple key retrievals. Because of this, we will be optimizing our design for functionality and correctness first, speed second. Designing an appropriate mapping of a file/directory structure into key-value storage is the most significant aspect of this project.
By running additional instances of FoundationDB, our software layer should scale without any changes. If our filesystem design is successful, this open source project will hopefully be a useful layer that can be shared with the larger FoundationDB community.
Our original project propsal can be found here.
While Windows can be used to run FDB server instances, our client does not support non-unix operating systems, and will not work on Windows (or in Windows Subsystem for Linux, which at this time does not support FUSE).
Full instructions for downloading the FDB client and server are available for Linux and MacOS.
For the client devices running the fslayer, you will only need to install fdbclient
. Likewise, the machines hositng the database will only need fdbserver
.
Both clients and servers will need to have a cluster file configured.
Once the cluster files are configured, each server can be started with your system daemon of choice, or with the fdbserver
command.
(Linux users with an up-to-date kernel should not need to perform this step)
To run the fslayer client on a MacOS machine, you will need to download and install macFUSE.
From the fslayer
directory, you can run
make build
or
./gradlew installDist
to install the application to fslayer/app/build/install/app
. You can then copy the bin & lib contents to your root directory, if you are root, or run the application from there.
Alternatively, you can produce a zip distribution with
make zip
or
./gradlew distZip
which will be output to fslayer/app/build/distributions
.
To run the client and mount the FoundationDB Filesystem, run
fslayer <mount-path>
with the only argument being the directory you wish to mount the filesystem to.
You will be prompted for a username and password. If this is your first time, you can enter any username and password and the database will record that as your login information. On subsequent logins, you can use the same username & password combination.
We have run FoundationFS against a small suite of filesystem tests. Documentation for those results can be found here
If your FDB client and server are set up, you can run cache-test.bash to verify that the cache versioning holds up when there are 100s of concurrent transactions.
testing/cache-test.bash
Our filesystem client leverages FoundationDB's ACID guaruntees (Atomicity, Consistency, Isolation, Durability) to provide a stable and consistent distributed filesystem.
Filesystem operations are passed to our client through use of Unix's Filesystem in USErspace (FUSE) functionality. These operations are then converted into key-value operations to access or modify the data stored in FoundationDB.
We use FoundationDB's DirectoryLayer extensively to create and manage unique key prefixes for each file and directory stored. A detailed spec for the Java DirectoryLayer implementation can be found here;
In our schema, a file is a DirectorySubspace. Each fixed-size chunk of data is stored with the key prefix generated from <path-to-file>/CHUNKS/<index>
.
Metadata such as mode
, uid
, and m_time
are stored as keys with the file's subspace prefix.
Directories are also DirectorySubspaces. Their metadata is stored with the subspace prefix <path-to-dir>/.
. The presence of the .
subspace distinguishes directories from files.
This schema allows us to easily list a directory's contents by grabbing all child prefixes, grab all the file data from loading the keyrange of the chunk subspace, and quickly access a file or directory's information from their path.
A VERSION
is a counter stored in every file or directory's metadata as its own key-value pair. Everytime a file or its metadata is modified, that version is increased by one. Whenever a directory's metadata is changed, or a file/directory is added or removed to it, it's version increments as well.
On a succesful read of a file or directory's contents, the client will cache the data, along with the VERSION
of that file or directory.
On subsequent reads, the client will compare the cached version of a file or directory to the value in the database, and update it's cache if they do not match. Because of FoundationDB's gaurunteed consistency and atomicity, we know that by checking this version we will always be viewing the most current state of the filesystem.
Each user that logs into the client gets assigned a UID starting at 70001 for operations on the database. This is the id that will be used to evaluate ownership and permissions on files. The ID for a user will be displayed in the console after a succesful login.
Please note that while files and directories might still display group permissions, group membership is currently not supported and will not be evaluated when determining if a user has access to a file operation. All users other than a file's owner will be evaluated using the "other" permission mode.
The database's mappings from username to userID are stored in the subspace ./IDMAP/<username>
,
while a user's PBKDF2 password hash is stored in ./AUTH/<username>
.
The key ./ID_COUNTER
stores the counter used to generate new unique UIDs.
The entry point for the application is App.java.
Here, you can see it create our wrapper around FDB, call our login manager, and pass both objects to our Fuse wrapper, which is then mounted.
The PermissionManager class handles password validation & storage, as well as loading user id mappings from the database.
The FuseLayer class implements the FuseStubFS interface provided by jnr-fuse
. It translates the information from system calls into arguments pased to our FoundationDB operations, then parses that result into what the system expects.
FoundationLayer.java implements our FoundationFileOperations.java interface.
This class stores a reference to the actual FoundationDB java object and makes db transactions to perform system filesystem calls.
In many of its methods, it determines if a path is a directory or file, then instantiates and delegates the system operation to an object representing the filesystem object in question.
FileSchema and DirectorySchema represent file objects at a given path. Their methods take in a reference to the DirectoryLayer
and a database transaction, then use these to read or modify the necessary keys to perform the file operation.