Skip to content

Commit

Permalink
Merge branch '6.4.tikv' of https://github.com/tikv/rocksdb into mutex
Browse files Browse the repository at this point in the history
  • Loading branch information
Little-Wallace committed Jan 23, 2022
2 parents c9493c7 + f236fe4 commit 9da6f60
Show file tree
Hide file tree
Showing 66 changed files with 2,201 additions and 591 deletions.
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -585,6 +585,7 @@ set(SOURCES
monitoring/iostats_context.cc
monitoring/perf_context.cc
monitoring/perf_level.cc
monitoring/perf_flag.cc
monitoring/persistent_stats_history.cc
monitoring/statistics.cc
monitoring/thread_status_impl.cc
Expand Down
7 changes: 7 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,14 @@
## Additional Improvements
### Public API Change
* DeleteRange now returns `Status::InvalidArgument` if the range's end key comes before its start key according to the user comparator. Previously the behavior was undefined.
* ldb now uses options.force_consistency_checks = true by default and "--disable_consistency_checks" is added to disable it.
* Removed unused structure `CompactionFilterContext`.

### New Features
* When user uses options.force_consistency_check in RocksDb, instead of crashing the process, we now pass the error back to the users without killing the process.
* Added experimental ColumnFamilyOptions::sst_partitioner_factory to define determine the partitioning of sst files. This helps compaction to split the files on interesting boundaries (key prefixes) to make propagation of sst files less write amplifying (covering the whole key space).
* Option `max_background_flushes` can be set dynamically using DB::SetDBOptions().
* Allow `CompactionFilter`s to apply in more table file creation scenarios such as flush and recovery. For compatibility, `CompactionFilter`s by default apply during compaction. Users can customize this behavior by overriding `CompactionFilterFactory::ShouldFilterTableFileCreation()`. Picked from [facebook/rocksdb#pr8243](https://github.com/facebook/rocksdb/pull/8243).

### Bug Fixes
* Fixed issue #6316 that can cause a corruption of the MANIFEST file in the middle when writing to it fails due to no disk space.
Expand All @@ -19,6 +22,8 @@
* Fix a bug in which a snapshot read could be affected by a DeleteRange after the snapshot (#6062).
* `WriteBatchWithIndex::DeleteRange` returns `Status::NotSupported`. Previously it returned success even though reads on the batch did not account for range tombstones. The corresponding language bindings now cannot be used. In C, that includes `rocksdb_writebatch_wi_delete_range`, `rocksdb_writebatch_wi_delete_range_cf`, `rocksdb_writebatch_wi_delete_rangev`, and `rocksdb_writebatch_wi_delete_rangev_cf`. In Java, that includes `WriteBatchWithIndex::deleteRange`.

### Performance Improvements
* When gathering unreferenced obsolete files for purging, file metas associated with active versions will no longer be copied for double-check. Updated VersionBuilder to make sure each physical file is reference counted by at most one FileMetaData.

## 6.4.6 (10/16/2019)
* Fix a bug when partitioned filters and prefix search are used in conjunction, ::SeekForPrev could return invalid for an existing prefix. ::SeekForPrev might be called by the user, or internally on ::Prev, or within ::Seek if the return value involves Delete or a Merge operand.
Expand Down Expand Up @@ -54,6 +59,7 @@
* ldb sometimes uses a string-append merge operator if no merge operator is passed in. This is to allow users to print keys from a DB with a merge operator.
* Replaces old Registra with ObjectRegistry to allow user to create custom object from string, also add LoadEnv() to Env.
* Added new overload of GetApproximateSizes which gets SizeApproximationOptions object and returns a Status. The older overloads are redirecting their calls to this new method and no longer assert if the include_flags doesn't have either of INCLUDE_MEMTABLES or INCLUDE_FILES bits set. It's recommended to use the new method only, as it is more type safe and returns a meaningful status in case of errors.
* LDBCommandRunner::RunCommand() to return the status code as an integer, rather than call exit() using the code.

### New Features
* Add argument `--secondary_path` to ldb to open the database as the secondary instance. This would keep the original DB intact.
Expand Down Expand Up @@ -93,6 +99,7 @@
* Add an option `unordered_write` which trades snapshot guarantees with higher write throughput. When used with WRITE_PREPARED transactions with two_write_queues=true, it offers higher throughput with however no compromise on guarantees.
* Allow DBImplSecondary to remove memtables with obsolete data after replaying MANIFEST and WAL.
* Add an option `failed_move_fall_back_to_copy` (default is true) for external SST ingestion. When `move_files` is true and hard link fails, ingestion falls back to copy if `failed_move_fall_back_to_copy` is true. Otherwise, ingestion reports an error.
* Add command `list_file_range_deletes` in ldb, which prints out tombstones in SST files.

### Performance Improvements
* Reduce binary search when iterator reseek into the same data block.
Expand Down
38 changes: 31 additions & 7 deletions db/builder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,25 @@ Status BuildTable(
TableProperties tp;

if (iter->Valid() || !range_del_agg->IsEmpty()) {
std::unique_ptr<CompactionFilter> compaction_filter;
if (ioptions.compaction_filter_factory != nullptr &&
ioptions.compaction_filter_factory->ShouldFilterTableFileCreation(
reason)) {
CompactionFilter::Context context;
context.is_full_compaction = false;
context.is_manual_compaction = false;
context.column_family_id = column_family_id;
context.reason = reason;
compaction_filter =
ioptions.compaction_filter_factory->CreateCompactionFilter(context);
if (compaction_filter != nullptr &&
!compaction_filter->IgnoreSnapshots()) {
return Status::NotSupported(
"CompactionFilter::IgnoreSnapshots() = false is not supported "
"anymore.");
}
}

TableBuilder* builder;
std::unique_ptr<WritableFileWriter> file_writer;
// Currently we only enable dictionary compression during compaction to the
Expand Down Expand Up @@ -141,17 +160,21 @@ Status BuildTable(
0 /*target_file_size*/, file_creation_time);
}

MergeHelper merge(env, internal_comparator.user_comparator(),
ioptions.merge_operator, nullptr, ioptions.info_log,
true /* internal key corruption is not ok */,
snapshots.empty() ? 0 : snapshots.back(),
snapshot_checker);
MergeHelper merge(
env, internal_comparator.user_comparator(), ioptions.merge_operator,
compaction_filter.get(), ioptions.info_log,
true /* internal key corruption is not ok */,
snapshots.empty() ? 0 : snapshots.back(), snapshot_checker);

CompactionIterator c_iter(
iter, internal_comparator.user_comparator(), &merge, kMaxSequenceNumber,
&snapshots, earliest_write_conflict_snapshot, snapshot_checker, env,
ShouldReportDetailedTime(env, ioptions.statistics),
true /* internal key corruption is not ok */, range_del_agg.get());
true /* internal key corruption is not ok */, range_del_agg.get(),
/*compaction=*/nullptr, compaction_filter.get(),
/*shutting_down=*/nullptr,
/*preserve_deletes_seqnum=*/0, /*manual_compaction_paused=*/nullptr);

c_iter.SeekToFirst();
for (; c_iter.Valid(); c_iter.Next()) {
const Slice& key = c_iter.key();
Expand Down Expand Up @@ -192,7 +215,8 @@ Status BuildTable(
meta->fd.file_size = file_size;
meta->marked_for_compaction = builder->NeedCompact();
assert(meta->fd.GetFileSize() > 0);
tp = builder->GetTableProperties(); // refresh now that builder is finished
tp = builder
->GetTableProperties(); // refresh now that builder is finished
if (table_properties) {
*table_properties = tp;
}
Expand Down
2 changes: 1 addition & 1 deletion db/builder.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
namespace rocksdb {

struct Options;
struct FileMetaData;
class FileMetaData;

class Env;
struct EnvOptions;
Expand Down
17 changes: 16 additions & 1 deletion db/c.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include "rocksdb/merge_operator.h"
#include "rocksdb/options.h"
#include "rocksdb/perf_context.h"
#include "rocksdb/perf_flag.h"
#include "rocksdb/rate_limiter.h"
#include "rocksdb/slice_transform.h"
#include "rocksdb/statistics.h"
Expand Down Expand Up @@ -54,7 +55,6 @@ using rocksdb::ColumnFamilyHandle;
using rocksdb::ColumnFamilyOptions;
using rocksdb::CompactionFilter;
using rocksdb::CompactionFilterFactory;
using rocksdb::CompactionFilterContext;
using rocksdb::CompactionOptionsFIFO;
using rocksdb::Comparator;
using rocksdb::CompressionType;
Expand Down Expand Up @@ -114,6 +114,9 @@ using rocksdb::Checkpoint;
using rocksdb::TransactionLogIterator;
using rocksdb::BatchResult;
using rocksdb::PerfLevel;
using rocksdb::EnablePerfFlag;
using rocksdb::DisablePerfFlag;
using rocksdb::CheckPerfFlag;
using rocksdb::PerfContext;
using rocksdb::MemoryUtil;

Expand Down Expand Up @@ -534,6 +537,10 @@ rocksdb_t* rocksdb_open_as_secondary(const rocksdb_options_t* options,
return result;
}

void rocksdb_resume(rocksdb_t* db, char** errptr) {
SaveError(errptr, db->rep->Resume());
}

rocksdb_backup_engine_t* rocksdb_backup_engine_open(
const rocksdb_options_t* options, const char* path, char** errptr) {
BackupEngine* be;
Expand Down Expand Up @@ -2749,6 +2756,14 @@ void rocksdb_set_perf_level(int v) {
SetPerfLevel(level);
}

void rocksdb_enable_perf_flag(uint64_t flag) { EnablePerfFlag(flag); }

void rocksdb_disable_perf_flag(uint64_t flag) { DisablePerfFlag(flag); }

int rocksdb_check_perf_flag(uint64_t flag) {
return static_cast<int>(CheckPerfFlag(flag));
}

rocksdb_perfcontext_t* rocksdb_perfcontext_create() {
rocksdb_perfcontext_t* context = new rocksdb_perfcontext_t;
context->rep = rocksdb::get_perf_context();
Expand Down
14 changes: 7 additions & 7 deletions db/column_family.cc
Original file line number Diff line number Diff line change
Expand Up @@ -738,7 +738,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
bool needed_delay = write_controller->NeedsDelay();

if (write_stall_condition == WriteStallCondition::kStopped &&
write_stall_cause == WriteStallCause::kMemtableLimit) {
write_stall_cause == WriteStallCause::kMemtableLimit && !mutable_cf_options.disable_write_stall) {
write_controller_token_ = write_controller->GetStopToken();
internal_stats_->AddCFStats(InternalStats::MEMTABLE_LIMIT_STOPS, 1);
ROCKS_LOG_WARN(
Expand All @@ -748,7 +748,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
name_.c_str(), imm()->NumNotFlushed(),
mutable_cf_options.max_write_buffer_number);
} else if (write_stall_condition == WriteStallCondition::kStopped &&
write_stall_cause == WriteStallCause::kL0FileCountLimit) {
write_stall_cause == WriteStallCause::kL0FileCountLimit && !mutable_cf_options.disable_write_stall) {
write_controller_token_ = write_controller->GetStopToken();
internal_stats_->AddCFStats(InternalStats::L0_FILE_COUNT_LIMIT_STOPS, 1);
if (compaction_picker_->IsLevel0CompactionInProgress()) {
Expand All @@ -759,7 +759,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
"[%s] Stopping writes because we have %d level-0 files",
name_.c_str(), vstorage->l0_delay_trigger_count());
} else if (write_stall_condition == WriteStallCondition::kStopped &&
write_stall_cause == WriteStallCause::kPendingCompactionBytes) {
write_stall_cause == WriteStallCause::kPendingCompactionBytes && !mutable_cf_options.disable_write_stall) {
write_controller_token_ = write_controller->GetStopToken();
internal_stats_->AddCFStats(
InternalStats::PENDING_COMPACTION_BYTES_LIMIT_STOPS, 1);
Expand All @@ -769,7 +769,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
"bytes %" PRIu64,
name_.c_str(), compaction_needed_bytes);
} else if (write_stall_condition == WriteStallCondition::kDelayed &&
write_stall_cause == WriteStallCause::kMemtableLimit) {
write_stall_cause == WriteStallCause::kMemtableLimit && !mutable_cf_options.disable_write_stall) {
write_controller_token_ =
SetupDelay(write_controller, compaction_needed_bytes,
prev_compaction_needed_bytes_, was_stopped,
Expand All @@ -784,7 +784,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
mutable_cf_options.max_write_buffer_number,
write_controller->delayed_write_rate());
} else if (write_stall_condition == WriteStallCondition::kDelayed &&
write_stall_cause == WriteStallCause::kL0FileCountLimit) {
write_stall_cause == WriteStallCause::kL0FileCountLimit && !mutable_cf_options.disable_write_stall) {
// L0 is the last two files from stopping.
bool near_stop = vstorage->l0_delay_trigger_count() >=
mutable_cf_options.level0_stop_writes_trigger - 2;
Expand All @@ -804,7 +804,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
name_.c_str(), vstorage->l0_delay_trigger_count(),
write_controller->delayed_write_rate());
} else if (write_stall_condition == WriteStallCondition::kDelayed &&
write_stall_cause == WriteStallCause::kPendingCompactionBytes) {
write_stall_cause == WriteStallCause::kPendingCompactionBytes && !mutable_cf_options.disable_write_stall) {
// If the distance to hard limit is less than 1/4 of the gap between soft
// and
// hard bytes limit, we think it is near stop and speed up the slowdown.
Expand All @@ -829,7 +829,7 @@ WriteStallCondition ColumnFamilyData::RecalculateWriteStallConditions(
name_.c_str(), vstorage->estimated_compaction_needed_bytes(),
write_controller->delayed_write_rate());
} else {
assert(write_stall_condition == WriteStallCondition::kNormal);
assert(write_stall_condition == WriteStallCondition::kNormal || mutable_cf_options.disable_write_stall);
if (vstorage->l0_delay_trigger_count() >=
GetL0ThresholdSpeedupCompaction(
mutable_cf_options.level0_file_num_compaction_trigger,
Expand Down
7 changes: 7 additions & 0 deletions db/compaction/compaction.cc
Original file line number Diff line number Diff line change
Expand Up @@ -531,6 +531,12 @@ std::unique_ptr<CompactionFilter> Compaction::CreateCompactionFilter(
return nullptr;
}

if (!cfd_->ioptions()
->compaction_filter_factory->ShouldFilterTableFileCreation(
TableFileCreationReason::kCompaction)) {
return nullptr;
}

CompactionFilter::Context context;
context.is_full_compaction = is_full_compaction_;
context.is_manual_compaction = is_manual_compaction_;
Expand All @@ -550,6 +556,7 @@ std::unique_ptr<CompactionFilter> Compaction::CreateCompactionFilter(
}
}
context.column_family_id = cfd_->GetID();
context.reason = TableFileCreationReason::kCompaction;
return cfd_->ioptions()->compaction_filter_factory->CreateCompactionFilter(
context);
}
Expand Down
9 changes: 5 additions & 4 deletions db/compaction/compaction_iterator.cc
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
// (found in the LICENSE.Apache file in the root directory).

#include "db/compaction/compaction_iterator.h"

#include "db/snapshot_checker.h"
#include "port/likely.h"
#include "rocksdb/listener.h"
Expand Down Expand Up @@ -78,8 +79,8 @@ CompactionIterator::CompactionIterator(
current_user_key_snapshot_(0),
merge_out_iter_(merge_helper_),
current_key_committed_(false),
snap_list_callback_(snap_list_callback) {
assert(compaction_filter_ == nullptr || compaction_ != nullptr);
snap_list_callback_(snap_list_callback),
level_(compaction_ == nullptr ? 0 : compaction_->level()) {
assert(snapshots_ != nullptr);
bottommost_level_ =
compaction_ == nullptr ? false : compaction_->bottommost_level();
Expand Down Expand Up @@ -121,7 +122,7 @@ void CompactionIterator::Next() {
key_ = merge_out_iter_.key();
value_ = merge_out_iter_.value();
bool valid_key __attribute__((__unused__));
valid_key = ParseInternalKey(key_, &ikey_);
valid_key = ParseInternalKey(key_, &ikey_);
// MergeUntil stops when it encounters a corrupt key and does not
// include them in the result, so we expect the keys here to be valid.
assert(valid_key);
Expand Down Expand Up @@ -176,7 +177,7 @@ void CompactionIterator::InvokeFilterIfNeeded(bool* need_skip,
{
StopWatchNano timer(env_, report_detailed_time_);
filter = compaction_filter_->FilterV3(
compaction_->level(), filter_key, seqno, value_type, value_,
level_, filter_key, seqno, value_type, value_,
&compaction_filter_value_, compaction_filter_skip_until_.rep());
iter_stats_.total_filter_time +=
env_ != nullptr && report_detailed_time_ ? timer.ElapsedNanos() : 0;
Expand Down
2 changes: 2 additions & 0 deletions db/compaction/compaction_iterator.h
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,8 @@ class CompactionIterator {
// number of distinct keys processed
size_t num_keys_ = 0;

const int level_;

bool IsShuttingDown() {
// This is a best-effort facility, so memory_order_relaxed is sufficient.
return shutting_down_ && shutting_down_->load(std::memory_order_relaxed);
Expand Down
13 changes: 6 additions & 7 deletions db/compaction/compaction_picker_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,6 @@ class CompactionPickerTest : public testing::Test {
f->fd.largest_seqno = largest_seq;
f->compensated_file_size =
(compensated_file_size != 0) ? compensated_file_size : file_size;
f->refs = 0;
vstorage_->AddFile(level, f);
files_.emplace_back(f);
file_map_.insert({file_number, {f, level}});
Expand Down Expand Up @@ -369,8 +368,8 @@ TEST_F(CompactionPickerTest, LevelTriggerDynamic4) {
mutable_cf_options_.max_bytes_for_level_multiplier = 10;
NewVersionStorage(num_levels, kCompactionStyleLevel);
Add(0, 1U, "150", "200");
Add(num_levels - 1, 3U, "200", "250", 300U);
Add(num_levels - 1, 4U, "300", "350", 3000U);
Add(num_levels - 1, 2U, "200", "250", 300U);
Add(num_levels - 1, 3U, "300", "350", 3000U);
Add(num_levels - 1, 4U, "400", "450", 3U);
Add(num_levels - 2, 5U, "150", "180", 300U);
Add(num_levels - 2, 6U, "181", "350", 500U);
Expand Down Expand Up @@ -575,7 +574,7 @@ TEST_F(CompactionPickerTest, CompactionPriMinOverlapping2) {
Add(2, 8U, "201", "300",
60000000U); // Overlaps with file 28, 29, total size 521M

Add(3, 26U, "100", "110", 261000000U);
Add(3, 25U, "100", "110", 261000000U);
Add(3, 26U, "150", "170", 261000000U);
Add(3, 27U, "171", "179", 260000000U);
Add(3, 28U, "191", "220", 260000000U);
Expand Down Expand Up @@ -1091,7 +1090,7 @@ TEST_F(CompactionPickerTest, EstimateCompactionBytesNeeded1) {
// Size ratio L4/L3 is 9.9
// After merge from L3, L4 size is 1000900
Add(4, 11U, "400", "500", 999900);
Add(5, 11U, "400", "500", 8007200);
Add(5, 12U, "400", "500", 8007200);

UpdateVersionStorageInfo();

Expand Down Expand Up @@ -1404,8 +1403,8 @@ TEST_F(CompactionPickerTest, IsTrivialMoveOn) {

Add(3, 5U, "120", "130", 7000U);
Add(3, 6U, "170", "180", 7000U);
Add(3, 5U, "220", "230", 7000U);
Add(3, 5U, "270", "280", 7000U);
Add(3, 7U, "220", "230", 7000U);
Add(3, 8U, "270", "280", 7000U);
UpdateVersionStorageInfo();

std::unique_ptr<Compaction> compaction(level_compaction_picker.PickCompaction(
Expand Down
Loading

0 comments on commit 9da6f60

Please sign in to comment.