Skip to content
This repository has been archived by the owner on Nov 4, 2024. It is now read-only.

Vhash #333

Draft
wants to merge 25 commits into
base: rehash
Choose a base branch
from
Draft

Vhash #333

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,23 @@ on: [push, pull_request]

jobs:
build:
runs-on: ubuntu-18.04
runs-on: ubuntu-20.04

steps:
- uses: actions/checkout@v2

- name: Check Java codestyle
run: |
cd java
mvn spotless:check

- name: Get cmake
uses: lukka/[email protected]

- name: Install packages
run: sudo apt install make clang-format-9 pkg-config g++ autoconf libtool asciidoctor libkmod-dev libudev-dev uuid-dev libjson-c-dev libkeyutils-dev pandoc libhwloc-dev libgflags-dev libtext-diff-perl bash-completion systemd wget git
run: |
sudo apt update
sudo apt install make clang-format-9 pkg-config g++ autoconf libtool asciidoctor libkmod-dev libudev-dev uuid-dev libjson-c-dev libkeyutils-dev pandoc libhwloc-dev libgflags-dev libtext-diff-perl bash-completion systemd wget git

- name: Install ndctl
run: |
Expand Down
20 changes: 19 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,18 @@ set(KVDK_ROOT_DIR ${CMAKE_CURRENT_SOURCE_DIR})
include(${KVDK_ROOT_DIR}/cmake/functions.cmake)
include(GNUInstallDirs)

set(CMAKE_CXX_STANDARD 11)
# set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD 17)

option(COVERAGE "code coverage" OFF)
option(KVDK_ENABLE_VHASH "Enable experimental VHash in KVDK" ON)

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx512f -mrdseed -mrdrnd -mclwb -mclflushopt")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS}")

if (CMAKE_BUILD_TYPE STREQUAL "Release")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O2")
elseif (CMAKE_BUILD_TYPE STREQUAL "RelWithDebInfo")
elseif (CMAKE_BUILD_TYPE STREQUAL "MinSizeRel")
elseif (CMAKE_BUILD_TYPE STREQUAL "Debug")
Expand Down Expand Up @@ -48,6 +52,7 @@ endif()
set(SOURCES
engine/c/kvdk_basic_op.cpp
engine/c/kvdk_batch.cpp
engine/c/kvdk_transaction.cpp
engine/c/kvdk_hash.cpp
engine/c/kvdk_list.cpp
engine/c/kvdk_sorted.cpp
Expand All @@ -68,14 +73,27 @@ set(SOURCES
engine/hash_collection/hash_list.cpp
engine/list_collection/list.cpp
engine/write_batch_impl.cpp
engine/transaction_impl.cpp
engine/dram_allocator.cpp
engine/pmem_allocator/pmem_allocator.cpp
engine/thread_manager.cpp
engine/pmem_allocator/free_list.cpp
engine/data_record.cpp
engine/dl_list.cpp
engine/version/old_records_cleaner.cpp
)

if (KVDK_ENABLE_VHASH)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mlzcnt -mbmi -mavx512bw -mavx512vl")
add_compile_definitions(KVDK_ENABLE_VHASH)
set(SOURCES
${SOURCES}
engine/kv_engine_vhash.cpp
engine/experimental/vhash_kv.cpp
engine/experimental/vhash.cpp
engine/experimental/vhash_group.cpp
)
endif()

# .so library
add_library(engine SHARED ${SOURCES})
Expand Down
11 changes: 5 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,18 @@
`KVDK` (Key-Value Development Kit) is a key-value store library implemented in C++ language. It is designed for supporting DRAM, Optane persistent memory and CXL memory pool. It also demonstrates several optimization methods for high performance with tiered memory. Besides providing the basic APIs of key-value store, it offers several advanced features, like read-modify-write, checkpoint, etc.

## Features
* Rich data types
* string, sorted, hash, list, hash
* Basic KV operations
* string get/set/update/delete
* Sorted KV operations
* sorted string get/set/update/scan/delete
* Rich value types
* list, hash
* get/put/update/delete/scan
* Read-Modify-Write
* Support TTL
* Atomic Batch Write
* Snapshot based Scan
* Consistent Dump & Restore to/from storage
* Consistent Checkpoint
* Transaction
* C/C++/Java APIs
* Support Transaction (coming soon)

# Limitations
* The maximum supported key-value size is 64KB-4GB.
Expand Down
108 changes: 101 additions & 7 deletions benchmark/bench.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ DEFINE_bool(
"Populate pmem space while creating a new instance. This can improve write "
"performance in runtime, but will take long time to init the instance");

DEFINE_int32(max_access_threads, 32, "Max access threads of the instance");
DEFINE_uint64(max_access_threads, 64, "Max access threads of the instance");

DEFINE_uint64(space, (256ULL << 30), "Max usable PMem space of the instance");

Expand Down Expand Up @@ -122,12 +122,28 @@ std::vector<std::vector<std::uint64_t>> latencies;
std::vector<PaddedEngine> random_engines;
std::vector<PaddedRangeIterators> ranges;

enum class DataType { String, Sorted, Hashes, List, Blackhole } bench_data_type;
enum class DataType {
String,
Sorted,
Hashes,
List,
VHash,
Blackhole
} bench_data_type;

enum class KeyDistribution { Range, Uniform, Zipf } key_dist;

enum class ValueSizeDistribution { Constant, Uniform } vsz_dist;

void LaunchNThreads(int n_thread, std::function<void(int tid)> func,
int id_start = 0) {
std::vector<std::thread> ts;
for (int i = id_start; i < id_start + n_thread; i++) {
ts.emplace_back(std::thread(func, i));
}
for (auto& t : ts) t.join();
}

std::uint64_t generate_key(size_t tid) {
static std::uint64_t max_key = FLAGS_existing_keys_ratio == 0
? UINT64_MAX
Expand Down Expand Up @@ -164,6 +180,24 @@ size_t generate_value_size(size_t tid) {
}
}

#ifdef KVDK_ENABLE_VHASH
void FillVHash(size_t tid) {
std::string key(8, ' ');
for (size_t i = 0; i < FLAGS_num_kv / FLAGS_num_collection; ++i) {
std::uint64_t num = ranges[tid].gen();
std::uint64_t cid = num % FLAGS_num_collection;
memcpy(&key[0], &num, 8);
StringView value = StringView(value_pool.data(), generate_value_size(tid));

Status s = engine->VHashPut(collections[cid], key, value);

if (s != Status::Ok) {
throw std::runtime_error{"VHashPut error"};
}
}
}
#endif

void DBWrite(int tid) {
std::string key(8, ' ');
std::unique_ptr<WriteBatch> batch;
Expand Down Expand Up @@ -230,6 +264,14 @@ void DBWrite(int tid) {
s = engine->ListPushFront(collections[cid], value);
break;
}
case DataType::VHash: {
#ifdef KVDK_ENABLE_VHASH
s = engine->VHashPut(collections[cid], key, value);
#else
s = Status::NotSupported;
#endif
break;
}
case DataType::Blackhole: {
s = Status::Ok;
break;
Expand Down Expand Up @@ -313,6 +355,20 @@ void DBScan(int tid) {
engine->HashIteratorRelease(iter);
break;
}
case DataType::VHash: {
auto iter = engine->VHashIteratorCreate(collections[cid]);
if (!iter) throw std::runtime_error{"Fail creating VHashIterator"};
for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {
key = iter->Key();
value_sink = iter->Value();
++operations;
if (operations > operations_counted + 1000) {
read_ops += (operations - operations_counted);
operations_counted = operations;
}
}
break;
}
case DataType::Blackhole: {
operations += 1024;
read_ops.fetch_add(1024);
Expand Down Expand Up @@ -366,6 +422,14 @@ void DBRead(int tid) {
s = engine->ListPopBack(collections[cid], &value_sink);
break;
}
case DataType::VHash: {
#ifdef KVDK_ENABLE_VHASH
s = engine->VHashGet(collections[cid], key, &value_sink);
#else
s = Status::NotSupported;
#endif
break;
}
case DataType::Blackhole: {
s = Status::Ok;
break;
Expand Down Expand Up @@ -412,6 +476,8 @@ void ProcessBenchmarkConfigs() {
bench_data_type = DataType::Hashes;
} else if (FLAGS_type == "list") {
bench_data_type = DataType::List;
} else if (FLAGS_type == "vhash") {
bench_data_type = DataType::VHash;
} else if (FLAGS_type == "blackhole") {
bench_data_type = DataType::Blackhole;
} else {
Expand All @@ -425,6 +491,7 @@ void ProcessBenchmarkConfigs() {
}
case DataType::Hashes:
case DataType::List:
case DataType::VHash:
case DataType::Sorted: {
collections.resize(FLAGS_num_collection);
for (size_t i = 0; i < FLAGS_num_collection; i++) {
Expand All @@ -437,6 +504,9 @@ void ProcessBenchmarkConfigs() {
if (FLAGS_batch_size > 0 && (bench_data_type == DataType::List)) {
throw std::invalid_argument{R"(List does not support batch write.)"};
}
if (FLAGS_batch_size > 0 && (bench_data_type == DataType::VHash)) {
throw std::invalid_argument{R"(VHash does not support batch write.)"};
}

// Check for scan flag
switch (bench_data_type) {
Expand All @@ -458,10 +528,11 @@ void ProcessBenchmarkConfigs() {

random_engines.resize(FLAGS_threads);
if (FLAGS_fill) {
assert(bench_data_type != DataType::VHash && "VHash don't need fill");
assert(FLAGS_read_ratio == 0);
key_dist = KeyDistribution::Range;
operations_per_thread = FLAGS_num_kv / FLAGS_max_access_threads + 1;
for (int i = 0; i < FLAGS_max_access_threads; i++) {
operations_per_thread = FLAGS_num_kv / FLAGS_threads + 1;
for (size_t i = 0; i < FLAGS_threads; i++) {
ranges.emplace_back(i * operations_per_thread,
(i + 1) * operations_per_thread);
}
Expand All @@ -475,6 +546,14 @@ void ProcessBenchmarkConfigs() {
throw std::invalid_argument{"Invalid key distribution"};
}
}
if (bench_data_type == DataType::VHash) {
// Vhash needs fill for read and update benchmarks
operations_per_thread = FLAGS_num_kv / FLAGS_max_access_threads + 1;
for (size_t i = 0; i < FLAGS_max_access_threads; i++) {
ranges.emplace_back(i * operations_per_thread,
(i + 1) * operations_per_thread);
}
}

if (FLAGS_value_size_distribution == "constant") {
vsz_dist = ValueSizeDistribution::Constant;
Expand Down Expand Up @@ -535,7 +614,6 @@ int main(int argc, char** argv) {
throw std::runtime_error{"Fail to create Sorted collection"};
}
}
engine->ReleaseAccessThread();
break;
}
case DataType::Hashes: {
Expand All @@ -545,7 +623,6 @@ int main(int argc, char** argv) {
throw std::runtime_error{"Fail to create Hashset"};
}
}
engine->ReleaseAccessThread();
break;
}
case DataType::List: {
Expand All @@ -555,7 +632,24 @@ int main(int argc, char** argv) {
throw std::runtime_error{"Fail to create List"};
}
}
engine->ReleaseAccessThread();
break;
}
case DataType::VHash: {
#ifdef KVDK_ENABLE_VHASH
for (auto col : collections) {
Status s =
engine->VHashCreate(col, FLAGS_num_kv / FLAGS_num_collection);
if (s != Status::Ok) {
throw std::runtime_error{"Fail to create VHash"};
}
}
if (!FLAGS_fill) {
LaunchNThreads(FLAGS_threads, FillVHash);
}
#else
throw std::runtime_error{"VHash not supported!"};
#endif

break;
}
default: {
Expand Down
11 changes: 10 additions & 1 deletion doc/benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@ To test performance of KVDK, you can run our benchmark tool "bench", the tool is

You can manually run individual benchmark follow the examples as shown bellow, or simply run our basic benchmark script "scripts/run_benchmark.py" to test all the basic read/write performance.

To run the script, you shoulf first build kvdk, then run:

```
scripts/run_benchmark.py [data_type] [key distribution]
```

data_type: Which data type to benchmark, it can be string/sorted/hash/list/blackhole/all

key distribution: Distribution of key of the benchmark workloads, it can be random/zipf/all
## Fill data to new instance

To test performance, we need to first fill key-value pairs to the KVDK instance. Since KVDK did not support cross-socket access yet, we need to bind bench program to a numa node:
Expand All @@ -20,7 +29,7 @@ Explanation of arguments:

-space: PMem space that allocate to the KVDK instance.

-max_access_threads: Max concurrent access threads of the KVDK instance, set it to the number of the hyper-threads for performance consideration.
-max_access_threads: Max concurrent access threads in the KVDK instance, set it to the number of the hyper-threads for performance consideration. You can call KVDK API with any number of threads, but if your parallel threads more than max_access_threads, the performance will be degraded due to synchronization cost

-type: Type of key-value pairs to benchmark, it can be "string", "hash" or "sorted".

Expand Down
Loading