-
Notifications
You must be signed in to change notification settings - Fork 53
SAFS user manual
SAFS is user-space filesystem designed for a large SSD array. The goal of SAFS is to maximize the I/O performance of the SSD array on a NUMA machine while still providing a filesystem interface to users. SAFS is specifically optimized for large files. A file exposed by SAFS is partitioned and each partition is stored as a physical file on an SSD. SAFS currently does not support directory operations.
SAFS provides basic operations on files: create, delete, read and write.
The class safs_file
represents an SAFS file and provides a few methods for some metadata operations such as creating a file, deleting a file and renaming a file.
class safs_file
{
public:
/* The constructor method. The file doesn't need to exist. */
safs_file(const RAID_config &conf, const std::string &file_name);
/* Test whether the SAFS file exists. */
bool exist() const;
/* Get the size of the SAFS file. */
ssize_t get_size() const;
/* Create the SAFS file with the specified size. */
bool create_file(size_t file_size);
/* Delete the SAFS file. */
bool delete_file();
/* Rename the SAFS file to a new name. */
bool rename(const std::string &new_name);
};
SAFS does not support directories. The function get_all_safs_files
returns all files in SAFS.
size_t get_all_safs_files(std::set<std::string> &files);
Two classes (file_io_factory
and io_interface
) are used for accessing data in a file. The class file_io_factory
creates and destroys io_interface
objects, which provides methods to read and write an SAFS file. An io_interface
instance can only access a single file and can only be used in a single thread. We intentionally make the implementations of io_interface
not thread-safe for the sake of performance.
When using file_io_factory
and io_interface
in multiple threads (a main thread and several worker threads), the recommended approach is to create a single file_io_factory
instance for an SAFS file in the main thread and create an io_interface
instance in each worker thread.
The function create_io_factory
creates a file_io_factory
instance for a file. It allows a user to specify an access option, which decides what type of the file_io_factory
instance is created. Right now, SAFS supports two access options:
- REMOTE_ACCESS: this corresponds to direct I/O in Linux. The
io_interface
instance created by such afile_io_factory
doesn't use page cache in SAFS. - GLOBAL_CACHE_ACCESS: this corresponds to buffered I/O in Linux. The
io_interface
instance uses page cache in SAFS.
Opening a file involves in two steps: invoking create_io_factory
to create a file_io_factory
object; invoking the create_io
method of file_io_factory
to create an io_interface
object. Files are closed implicitly when the file_io_factory
object is destroyed.
A user can use the following method of io_interface
to issue synchronous I/O requests. access_method
determines whether it is a read or write request: 0 indicates read and 1 indicates write.
class io_interface
{
public:
io_status access(char *buf, off_t off, ssize_t size, int access_method);
...
};
Users can use the following set of methods to use asynchronous I/O. First, users need to implement the callback
interface and register it to an io_interface
object to get notification of completion of I/O requests before issuing any I/O requests. Then, they use the asynchronous version of the access
method to issue I/O requests. When a request completes, the callback
is invoked. It is guaranteed that the callback
object will be invoked in the same thread where the I/O request was issued. An io_interface
instance does not limit the number of parallel I/O requests that can be issued to it. Users can monitor the number of incomplete I/O requests with the num_pending_ios
method and wait for the I/O to complete with the wait4complete
method.
class io_interface
{
public:
...
/* Issue asynchronous I/O requests. */
void access(io_request *, int, io_status *);
/* Flash I/O requests buffered by the io_interface instance. */
void flush_requests();
/* Wait for at least the specified number of I/O requests to complete. */
int wait4complete(int);
/* Get the number of pending I/O requests. */
int num_pending_ios() const;
/* set the callback function. */
bool set_callback(callback::ptr);
};
class callback
{
public:
virtual int invoke(io_request *reqs[], int num) = 0;
};
The following pseudocode illustrates a simple use case of SAFS, which uses its synchronous I/O interface to read data from a file.
#include "io_interface.h"
class task
{
// Defined by users.
...
public:
size_t get_size() const;
off_t get_offset() const;
};
static void test(const std::string conf_file, const std::string &graph_file,
const std::vector<task> &tasks)
{
config_map::ptr configs = config_map::create(conf_file);
init_io_system(configs);
file_io_factory::shared_ptr factory = create_io_factory(graph_file,
REMOTE_ACCESS);
io_interface::ptr io = create_io(factory, thread::get_curr_thread());
char *buf = NULL;
size_t buf_capacity = 0;
BOOST_FOREACH(task t, tasks) {
// This is directed I/O. Memory buffer, I/O offset and I/O size
// all need to be aligned to the I/O block size.
size_t io_size = ROUNDUP_PAGE(t.get_size());
data_loc_t loc(factory->get_file_id(), t.get_offset());
if (io_size > buf_capacity) {
free(buf);
buf_capacity = io_size;
buf = (char *) valloc(buf_capacity);
}
assert(buf_capacity >= io_size);
io_request req(buf, loc, io_size, READ);
io->access(&req, 1);
io->wait4complete(1);
run_computation(buf, io_size);
}
free(buf);
}
The following pseudocode illustrates a use case of SAFS' asynchronous I/O interface to read data from a file. It is slightly more complex than the synchronous I/O interface. It requires to define a callback
class and the computation is performed in the callback
class.
#include "io_interface.h"
class compute_callback: public callback
{
public:
virtual int invoke(io_request *reqs[], int num);
};
int compute_callback::invoke(io_request *reqs[], int num)
{
for (int i = 0; i < num; i++) {
char *buf = reqs[i]->get_buf();
run_computation(buf, reqs[i]->get_size());
free(buf);
}
return 0;
}
static void test(const std::string conf_file, const std::string &graph_file,
const std::vector<task> &tasks)
{
config_map::ptr configs = config_map::create(conf_file);
init_io_system(configs);
file_io_factory::shared_ptr factory = create_io_factory(graph_file,
REMOTE_ACCESS);
io_interface::ptr io = create_io(factory, thread::get_curr_thread());
io->set_callback(callback::ptr(new compute_callback()));
int max_ios = 20;
BOOST_FOREACH(task t, tasks) {
while (io->num_pending_ios() >= max_ios)
io->wait4complete(1);
// This is directed I/O. Memory buffer, I/O offset and I/O size
// all need to be aligned to the I/O block size.
size_t io_size = ROUNDUP_PAGE(t.get_size());
data_loc_t loc(factory->get_file_id(), t.get_offset());
// The buffer will be free'd in the callback function.
char *buf = (char *) valloc(io_size);
io_request req(buf, loc, io_size, READ);
io->access(&req, 1);
}
io->wait4complete(io->num_pending_ios());
}
SAFS-util is a tool that helps to manage SAFS. It provides a few commands to operate SAFS:
- create: create a file in SAFS.
- delete file_name: delete a file in SAFS.
- list: list all existing files in SAFS.
- load: load a file from an external filesystem to a file in SAFS.
- verify: verify the data of a file in SAFS. It’s mainly used for testing.
SAFS requires proper Linux kernel configurations and the filesystem configurations to get the maximal performance from an SSD array.
SAFS runs on a large SSD array in a machine of non-uniform memory architecture (NUMA), we need to configure the Linux kernel properly to get the maximal performance from the SSD array. Kernel configurations include:
- evenly distribute interrupts to all CPU cores;
- use the noop I/O scheduler.
- set I/O request affinity for each SSD to force I/O request completion on the requesting CPU core;
- prevent I/O on SSDs from contributing to the entropy pool of the random generator.
- use a large sector size for SSDs (maximal sector size is a SSD-specific parameter).
We provide two scripts to automate the process.
-
conf/set_affinity.sh
: The script distributes IRQs (Interrupt Requests) to CPU cores evenly. It is only required when we run SAFS on a NUMA machine, and it is specifically written for an LSI host bus adapter (HBA). For other HBAs, users need to adapt the script for their specific hardware. -
conf/set_ssds.pl
: The script takes an input file that contains device files to run SAFS on. The input file has a device file on each line. The script sets up the remaining configurations. Then it mounts SSDs to the system, and creates conf/data_files.txt, used as the configuration file of the root directories by the library.
SAFS defines the following parameters for users to customize SAFS. When SAFS is initialized, users have the opportunity to set them.
-
root_conf
: a config file that specifies the directories on SSDs where SAFS runs. The config file has a line for each directory and the format of each line isnode_id:abs_path_of_directory
. SAFS requires users to provide absolute paths to the directories on SSDs. -
RAID_block_size
: defines the size of a data block on an SSD. Users can specify the size with the format x(k, K, m, M, g, G). e.g., 4k = 4096 bytes. The default block size is 512KB. -
RAID_mapping
: defines how data blocks of a file are mapped to SSDs. Currently, the library provides three mapping functions: RAID0, RAID5, HASH. The default mapping is RAID5. -
cache_size
: define the size of the page cache. It uses the format of x(k, K, m, M, g, G). The default cache size is 512MB. -
num_nodes
: defines the number of NUMA nodes where the page cache is allocated. The default number is 1.