Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fixes for create_vdev #381

Merged
merged 5 commits into from
Apr 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion conanfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

class HomestoreConan(ConanFile):
name = "homestore"
version = "6.2.2"
version = "6.2.3"

homepage = "https://github.com/eBay/Homestore"
description = "HomeStore Storage Engine"
Expand Down
4 changes: 4 additions & 0 deletions src/lib/device/chunk.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ class Chunk {
uint64_t start_offset() const { return m_chunk_info.chunk_start_offset; }
uint64_t size() const { return m_chunk_info.chunk_size; }
bool is_busy() const { return m_chunk_info.is_allocated(); }
bool is_align() const {
return (m_chunk_info.chunk_start_offset % m_pdev->align_size() == 0) &&
(m_chunk_info.chunk_size % m_pdev->align_size() == 0);
}
uint32_t vdev_id() const { return m_chunk_info.vdev_id; }
uint16_t chunk_id() const { return static_cast< uint16_t >(m_chunk_info.chunk_id); }
uint32_t pdev_ordinal() const { return m_chunk_info.chunk_ordinal; }
Expand Down
60 changes: 43 additions & 17 deletions src/lib/device/device_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include <homestore/homestore_decl.hpp>
#include "device/chunk.h"
#include "device/device.h"
#include "device/hs_super_blk.h"
#include "device/physical_dev.hpp"
#include "device/virtual_dev.hpp"
#include "common/homestore_utils.hpp"
Expand Down Expand Up @@ -187,10 +188,9 @@ shared< VirtualDev > DeviceManager::create_vdev(vdev_parameters&& vparam) {
if (vdev_id == sisl::Bitset::npos) { throw std::out_of_range("System has no room for additional vdev"); }
m_vdev_id_bm.set_bit(vdev_id);

// Determine if we have a devices available on requested dev_tier. If so use them, else fallback to data tier
std::vector< PhysicalDev* > pdevs = pdevs_by_type_internal(vparam.dev_type);
RELEASE_ASSERT_GT(pdevs.size(), 0, "Unable to find any pdevs for even data tier, can't create vdev");

RELEASE_ASSERT_GT(pdevs.size(), 0, "Unable to find any pdevs for given vdev type, can't create vdev");
RELEASE_ASSERT(vparam.blk_size % pdevs[0]->align_size() == 0, "blk_size should be multiple of pdev align_size");
// Identify the number of chunks
if (vparam.multi_pdev_opts == vdev_multi_pdev_opts_t::ALL_PDEV_STRIPED) {
auto total_streams = std::accumulate(pdevs.begin(), pdevs.end(), 0u,
Expand All @@ -204,11 +204,11 @@ shared< VirtualDev > DeviceManager::create_vdev(vdev_parameters&& vparam) {
pdevs.erase(pdevs.begin() + 1, pdevs.end()); // TODO: Pick random one
}

// Adjust the maximum number chunks requested before round up vdev size.
uint32_t max_num_chunks = 0;
for (const auto& d : m_dev_infos) {
max_num_chunks += hs_super_blk::max_chunks_in_pdev(d);
}
// Based on the min chunk size, we calculate the max number of chunks that can be created in each target pdev
uint32_t min_chunk_size = hs_super_blk::min_chunk_size(vparam.dev_type);
// FIXME: it is possible that each vdev is less than max_num_chunks, but total is more than MAX_CHUNKS_IN_SYSTEM.
Copy link
Contributor

@yamingk yamingk Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a true statement, but it won't cause any problem, right?
max_num_chunks only garunteens chunk_size won't be smaller than the min_chunk_size.
The actual vparam.num_chunks is a std::min of itself and this max_num_chunks, line: 223.
And the input vparam.num_chunks is provided by HS consumer and this consumer can control all the services requested in HS won't exceed MAX_CHUNK_IN_SYSTEM, right? (We can have an assert at the end of format_and_start to make sure all the num_chunks for each request services don't exceed MAX_CHUNK_IN_SYSTEM).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will not causing issue ,just fail the create vdev request.
I was thinking to tune the num_chunks based on the available chunk count. i.e if we have 1000 chunks available and the input requested 2000, we can tune that down to 1000... this is just a minor improvement .

Copy link
Contributor

@yamingk yamingk Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not fail the create vdev request, if the consumer of HS set the correct num_chunks for each services during hs.format_and_start, right?

Your thought is also valid, that consumer just supply a total num_chunks consumer want to create and let HS do whatever trick to figure out available chunks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about the case, we have 64K chunk limit across all vdevs. This 64K is a magic number that the client do not know.
Even client knows like homeobject, several service created with

            HomeStore::instance()->format_and_start({
                {HS_SERVICE::META, hs_format_params{.dev_type = run_on_type, .size_pct = 5.0}},
                {HS_SERVICE::LOG, hs_format_params{.dev_type = run_on_type, .size_pct = 10.0, .chunk_size = 32 * Mi}},
                {HS_SERVICE::INDEX, hs_format_params{.dev_type = run_on_type, .size_pct = 5.0}},
                {HS_SERVICE::REPLICATION,
                 hs_format_params{.dev_type = run_on_type,
                                  .size_pct = 79.0,
                                  .num_chunks = 63000,
                                  .block_size = _data_block_size,
                                  .alloc_type = blk_allocator_type_t::append,
                                  .chunk_sel_type = chunk_selector_type_t::CUSTOM}},
            });

It is very hard to know if we are exceeding 64K, as LOG device based on chunk_size, num_chunks is in-direct. META/INDEX doesnt specified num_chunk but based on the default value of hs_format_params....

Later if LOG moved to dynamic size it is more tricky as the max size of the log device can be limited by the available chunk count left by other svc. e.g if data take 65500 then only 34 chunks left for log dev .

Copy link
Contributor

@yamingk yamingk Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree right now it is a well-known number. We can get it through a static API to get the max_num_chunks from homestore (it can optimally to return the "real" max number, already taking min_chunk_size into account).

I was thinking consumer need to specify num_chunks for all the requested services. In your case above, if we want to support only one service specify num_chunks, and let other services to land automatically based on vdev size, then yes we also need to play some tricks in homestore.

Copy link
Collaborator Author

@xiaoxichen xiaoxichen Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe lets ignore the well known number part for now, this is a easy fix issue whenever we want to fix.

I was thinking consumer need to specify num_chunks for all the requested services.

I feel very confused regarding the path of dynamic size vdev introduced by LOG.... by dynamic sizing it naturally cannot provide num_chunks, but we need to reserve N chunks for it to be dynamic within [0, N * CHUNK_SIZE)

// uint32 convert is safe as it only overflow when vdev size > 64PB with 16MB min_chunk_size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add an assert to make sure vdev size is smaller than or eaqual to 64 PB?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we can have a vdev > 64PB in any near future.

uint32_t max_num_chunks = std::min(uint32_t(vparam.vdev_size / min_chunk_size), hs_super_blk::MAX_CHUNKS_IN_SYSTEM);

auto input_vdev_size = vparam.vdev_size;
if (vparam.size_type == vdev_size_type_t::VDEV_SIZE_STATIC) {
Expand All @@ -217,31 +217,35 @@ shared< VirtualDev > DeviceManager::create_vdev(vdev_parameters&& vparam) {

// Either num_chunks or chunk_size can be provided and we calculate the other.
if (vparam.num_chunks != 0) {
vparam.vdev_size = sisl::round_up(vparam.vdev_size, vparam.num_chunks * vparam.blk_size);
if (input_vdev_size != vparam.vdev_size) {
LOGINFO(
"{} Virtual device is attempted to be created with size={}, it needs to be rounded to new_size={}",
vparam.vdev_name, in_bytes(input_vdev_size), in_bytes(vparam.vdev_size));
}

auto input_num_chunks = vparam.num_chunks;
vparam.num_chunks = std::min(vparam.num_chunks, max_num_chunks);
if (input_num_chunks != vparam.num_chunks) {
LOGINFO("{} Virtual device is attempted to be created with num_chunks={}, it needs to be adjust to "
"new_num_chunks={}",
vparam.vdev_name, in_bytes(input_num_chunks), in_bytes(vparam.num_chunks));
}

// this ensure chunk_size % vparam.blk_size == 0
vparam.vdev_size = sisl::round_down(vparam.vdev_size, vparam.num_chunks * vparam.blk_size);
if (input_vdev_size != vparam.vdev_size) {
LOGINFO(
"{} Virtual device is attempted to be created with size={}, it needs to be rounded to new_size={}"
" to be the multiple of {} (num_chunks {} * blk_size {}).",
vparam.vdev_name, input_vdev_size, vparam.vdev_size,
in_bytes(vparam.num_chunks * vparam.blk_size), vparam.num_chunks, in_bytes(vparam.blk_size));
}
vparam.chunk_size = vparam.vdev_size / vparam.num_chunks;
} else if (vparam.chunk_size != 0) {
auto input_chunk_size = vparam.chunk_size;
vparam.chunk_size = std::min(vparam.chunk_size, hs_super_blk::MAX_CHUNKS_IN_SYSTEM);
vparam.chunk_size = std::max(vparam.chunk_size, min_chunk_size);
vparam.chunk_size = sisl::round_up(vparam.chunk_size, vparam.blk_size);
if (input_chunk_size != vparam.chunk_size) {
LOGINFO("{} Virtual device is attempted to be created with chunk_size={}, it needs to be adjust to "
"new_chunk_size={}",
vparam.vdev_name, in_bytes(input_chunk_size), in_bytes(vparam.chunk_size));
}

vparam.vdev_size = sisl::round_up(vparam.vdev_size, vparam.chunk_size);
vparam.vdev_size = sisl::round_down(vparam.vdev_size, vparam.chunk_size);
if (input_vdev_size != vparam.vdev_size) {
LOGINFO(
"{} Virtual device is attempted to be created with size={}, it needs to be rounded to new_size={}",
Expand All @@ -256,7 +260,29 @@ shared< VirtualDev > DeviceManager::create_vdev(vdev_parameters&& vparam) {
} else {
// We need chunk_size. We start with zero num_chunks.
RELEASE_ASSERT_GT(vparam.chunk_size, 0, "Chunk size should be provided");
auto input_chunk_size = vparam.chunk_size;
vparam.chunk_size = std::max(vparam.chunk_size, min_chunk_size);
vparam.chunk_size = sisl::round_up(vparam.chunk_size, vparam.blk_size);
if (input_chunk_size != vparam.chunk_size) {
LOGINFO("{} Virtual device is attempted to be created with chunk_size={}, it needs to be adjust to "
"new_chunk_size={}",
vparam.vdev_name, in_bytes(input_chunk_size), in_bytes(vparam.chunk_size));
}

vparam.vdev_size = sisl::round_down(vparam.vdev_size, vparam.chunk_size);
if (input_vdev_size != vparam.vdev_size) {
LOGINFO(
"{} Virtual device is attempted to be created with size={}, it needs to be rounded to new_size={}",
vparam.vdev_name, in_bytes(input_vdev_size), in_bytes(vparam.vdev_size));
}
}
// sanity checks
RELEASE_ASSERT(vparam.vdev_size % vparam.chunk_size == 0, "vdev_size should be multiple of chunk_size");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good set of asserts I was about to suggest.

RELEASE_ASSERT(vparam.chunk_size % vparam.blk_size == 0, "chunk_size should be multiple of blk_size");
RELEASE_ASSERT(vparam.chunk_size >= min_chunk_size, "chunk_size should be greater than or equal to min_chunk_size");

RELEASE_ASSERT(vparam.num_chunks <= max_num_chunks, "num_chunks should be less than or equal to max_num_chunks");
RELEASE_ASSERT(input_vdev_size >= vparam.vdev_size, "vdev_size should be less than or equal to input_vdev_size");

LOGINFO(
"New Virtal Dev={} of size={} with id={} is attempted to be created with multi_pdev_opts={}. The params are "
Expand Down
9 changes: 9 additions & 0 deletions src/lib/device/hs_super_blk.h
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,15 @@ class hs_super_blk {
#endif
return (dinfo.dev_size - 1) / min_chunk_size + 1;
}
static uint32_t min_chunk_size(HSDevType dtype) {
uint64_t min_chunk_size =
(dtype == HSDevType::Fast) ? MIN_CHUNK_SIZE_FAST_DEVICE : MIN_CHUNK_SIZE_DATA_DEVICE;
#ifdef _PRERELEASE
auto chunk_size = iomgr_flip::instance()->get_test_flip< long >("set_minimum_chunk_size");
if (chunk_size) { min_chunk_size = chunk_size.get(); }
#endif
return min_chunk_size;
}
};

} // namespace homestore
27 changes: 26 additions & 1 deletion src/tests/test_device_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ class DeviceMgrTest : public ::testing::Test {
});
m_dmgr->is_first_time_boot() ? m_dmgr->format_devices() : m_dmgr->load_devices();
m_pdevs = m_dmgr->get_pdevs_by_dev_type(homestore::HSDevType::Data);
m_vdevs = m_dmgr->get_vdevs();
}

void restart() {
Expand Down Expand Up @@ -121,6 +122,7 @@ class DeviceMgrTest : public ::testing::Test {
std::map< const PhysicalDev*, uint32_t > chunks_in_pdev_count;
for (const auto& chunk : chunks) {
ASSERT_EQ(chunk->size(), size) << "All chunks are not equally sized in vdev";
ASSERT_EQ(chunk->is_align(), true) << "All chunks should be aligned";

auto [it, inserted] = chunks_in_pdev_count.insert(std::pair(chunk->physical_dev(), 1u));
if (!inserted) { ++(it->second); }
Expand All @@ -140,7 +142,7 @@ TEST_F(DeviceMgrTest, StripedVDevCreation) {
avail_size += pdev->data_size();
}

uint32_t size_pct = 2;
uint32_t size_pct = 4;
uint64_t remain_size = avail_size;

LOGINFO("Step 1: Creating {} vdevs with combined size as {}", num_test_vdevs, in_bytes(avail_size));
Expand Down Expand Up @@ -174,6 +176,29 @@ TEST_F(DeviceMgrTest, StripedVDevCreation) {
this->validate_striped_vdevs();
}

TEST_F(DeviceMgrTest, SmallStripedVDevCreation) {
Copy link
Contributor

@yamingk yamingk Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not suggesting it needs to be included in this PR. We can open tickets to add:

  • Mixed drive types with very large drive size (simulated), with only provide chunk_size or num_chunks.
  • Pure HDD with very large drive size (32TB), with only chunk_size or num_chunks.
  • Pure Fast drive with very large drive size (32TB), with only chunk_size or num_chunks.
  • Create vdev not providing num_chunks with size_type equal to dynamic, e.g. test start with zero num_chunks

std::string name = "test_vdev_small";

// Create a vdev small to one minimal chunk per pdev
auto num_chunks = uint32_cast(m_pdevs.size() * 2);
auto size = m_pdevs.size() * hs_super_blk::min_chunk_size(homestore::HSDevType::Data);

LOGINFO("Step 1: Creating vdev of name={} with size={}", name, in_bytes(size));
auto vdev =
m_dmgr->create_vdev(homestore::vdev_parameters{.vdev_name = name,
.vdev_size = size,
.num_chunks = num_chunks,
.blk_size = 4096,
.dev_type = HSDevType::Data,
.alloc_type = blk_allocator_type_t::none,
.chunk_sel_type = chunk_selector_type_t::NONE,
.multi_pdev_opts = vdev_multi_pdev_opts_t::ALL_PDEV_STRIPED,
.context_data = sisl::blob{}});


ASSERT_EQ(vdev->get_chunks().size(), m_pdevs.size()) << "Expected vdev to be created with 1 chunk per pdev";
}

TEST_F(DeviceMgrTest, CreateChunk) {
// Create dynamically chunks and verify no two chunks ahve same start offset.
uint64_t avail_size{0};
Expand Down
24 changes: 12 additions & 12 deletions src/tests/test_journal_vdev.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,8 @@ class VDevJournalIOTest : public ::testing::Test {
{HS_SERVICE::META, {.size_pct = 15.0}},
{HS_SERVICE::LOG,
{.size_pct = 50.0,
.chunk_size = 8 * 1024 * 1024,
.min_chunk_size = 8 * 1024 * 1024,
.chunk_size = 16 * 1024 * 1024,
.min_chunk_size = 16 * 1024 * 1024,
.vdev_size_type = vdev_size_type_t::VDEV_SIZE_DYNAMIC}},
},
nullptr /* starting_cb */, false /* restart */);
Expand All @@ -91,8 +91,8 @@ class VDevJournalIOTest : public ::testing::Test {
{HS_SERVICE::META, {.size_pct = 15.0}},
{HS_SERVICE::LOG,
{.size_pct = 50.0,
.chunk_size = 8 * 1024 * 1024,
.min_chunk_size = 8 * 1024 * 1024,
.chunk_size = 16 * 1024 * 1024,
.min_chunk_size = 16 * 1024 * 1024,
.vdev_size_type = vdev_size_type_t::VDEV_SIZE_DYNAMIC}},
},
nullptr /* starting_cb */, true /* restart */);
Expand Down Expand Up @@ -505,10 +505,10 @@ TEST_F(VDevJournalIOTest, Recovery) {
}

TEST_F(VDevJournalIOTest, MultipleChunkTest) {
// Chunk size is 8MB and each data log entry will be of size 3MB to create gaps.
// Chunk size is 16MB and each data log entry will be of size 6MB to create gaps.
uint64_t MB = 1024 * 1024;
uint64_t chunk_size = 8 * MB;
uint64_t data_size = 3 * MB;
uint64_t chunk_size = 16 * MB;
uint64_t data_size = 6 * MB;
JournalDescriptorTest test(1);
auto log_dev_jd = test.vdev_jd();

Expand Down Expand Up @@ -577,30 +577,30 @@ TEST_F(VDevJournalIOTest, MultipleChunkTest) {
uint64_t trunc_sz = 2 * data_size;
uint64_t trunc_offset = 2 * data_size;
test.truncate(trunc_offset);
test.verify_journal_descriptor(log_dev_jd, {.ds = 0x600000, .end = 4 * chunk_size, .writesz = writesz - trunc_sz,
test.verify_journal_descriptor(log_dev_jd, {.ds = trunc_offset, .end = 4 * chunk_size, .writesz = writesz - trunc_sz,
.rsvdsz = 0, .chunks = 4, .trunc = true, .total = 4 * chunk_size, .seek = 0x0});

// Truncate one more entry. Release one chunk back and reduce chunk count. Increase the data start.
LOGINFO("Truncating one entry");
trunc_offset = chunk_size + data_size;
trunc_sz = chunk_size + data_size;
test.truncate(trunc_offset);
test.verify_journal_descriptor(log_dev_jd, {.ds = 0xb00000, .end = 4 * chunk_size, .writesz = writesz - trunc_sz,
test.verify_journal_descriptor(log_dev_jd, {.ds = trunc_offset, .end = 4 * chunk_size, .writesz = writesz - trunc_sz,
.rsvdsz = 0, .chunks = 3, .trunc = true, .total = 3 * chunk_size, .seek = 0x0});

// Restart homestore and restore the offsets.
LOGINFO("Restart homestore");
restart_restore();
test.verify_journal_descriptor(log_dev_jd, {.ds = 0xb00000, .end = 4 * chunk_size, .writesz = writesz - trunc_sz,
test.verify_journal_descriptor(log_dev_jd, {.ds = trunc_offset, .end = 4 * chunk_size, .writesz = writesz - trunc_sz,
.rsvdsz = 0, .chunks = 3, .trunc = false, .total = 3 * chunk_size, .seek = 0x0});
test.read_all();

// Truncate all entries. Num chunks 1, write sz should be 0.
LOGINFO("Truncating all entries");
trunc_offset = log_dev_jd->tail_offset();
test.truncate(trunc_offset);
test.verify_journal_descriptor(log_dev_jd, {.ds = 0x1b00000, .end = 0x2000000, .writesz = 0, .rsvdsz = 0,
.chunks = 1, .trunc = true, .total = 8388608, .seek = 0x0});
test.verify_journal_descriptor(log_dev_jd, {.ds = trunc_offset, .end = 4 * chunk_size, .writesz = 0, .rsvdsz = 0,
.chunks = 1, .trunc = true, .total = chunk_size, .seek = 0x0});

// clang-format on
}
Expand Down
Loading