Skip to content

Commit

Permalink
[rocksdb] Recovery path sequence miscount fix
Browse files Browse the repository at this point in the history
Summary:
Consider the following WAL with 4 batch entries prefixed with their sequence at time of memtable insert.
[1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(a)]
[1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(b)]
[4: COMMIT(a)]
[7: COMMIT(b)]

The first two batches do not consume any sequence numbers so are both prefixed with seq=1.
For 2pc commit, memtable insertion takes place before COMMIT batch is written to WAL.
We can see that sequence number consumption takes place between WAL entries giving us the seemingly sparse sequence prefix for WAL entries.
This is a valid WAL.

Because with 2PC markers one WriteBatch points to another batch containing its inserts a writebatch can consume more or less sequence numbers than the number of sequence consuming entries that it contains.

We can see that, given the entries in the WAL, 6 sequence ids were consumed. Yet on recovery the maximum sequence consumed would be 7 + 3 (the number of sequence numbers consumed by COMMIT(b))

So, now upon recovery we must track the actual consumption of sequence numbers.
In the provided scenario there will be no sequence gaps, but it is possible to produce a sequence gap. This should not be a problem though. correct?

Test Plan: provided test.

Reviewers: sdong

Subscribers: andrewkr, leveldb, dhruba, hermanlee4

Differential Revision: https://reviews.facebook.net/D57645
  • Loading branch information
reidHoruff committed May 10, 2016
1 parent 8a66c85 commit a657ee9
Show file tree
Hide file tree
Showing 9 changed files with 549 additions and 416 deletions.
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,7 @@ set(TESTUTIL_SOURCE
db/db_test_util.cc
table/mock_table.cc
util/mock_env.cc
util/fault_injection_test_env.cc
util/thread_status_updater_debug.cc
)

Expand Down
14 changes: 6 additions & 8 deletions db/db_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1471,15 +1471,19 @@ Status DBImpl::RecoverLogFiles(const std::vector<uint64_t>& log_numbers,
}
#endif // ROCKSDB_LITE

if (*max_sequence == kMaxSequenceNumber) {
*max_sequence = WriteBatchInternal::Sequence(&batch);
}
WriteBatchInternal::SetSequence(&batch, *max_sequence);

// If column family was not found, it might mean that the WAL write
// batch references to the column family that was dropped after the
// insert. We don't want to fail the whole write batch in that case --
// we just ignore the update.
// That's why we set ignore missing column families to true
status = WriteBatchInternal::InsertInto(
&batch, column_family_memtables_.get(), &flush_scheduler_, true,
log_number, this);

log_number, this, true, false, max_sequence);
MaybeIgnoreError(&status);
if (!status.ok()) {
// We are treating this as a failure while reading since we read valid
Expand All @@ -1488,12 +1492,6 @@ Status DBImpl::RecoverLogFiles(const std::vector<uint64_t>& log_numbers,
continue;
}

const SequenceNumber last_seq = WriteBatchInternal::Sequence(&batch) +
WriteBatchInternal::Count(&batch) - 1;
if ((*max_sequence == kMaxSequenceNumber) || (last_seq > *max_sequence)) {
*max_sequence = last_seq;
}

if (!read_only) {
// we can do this because this is called before client has access to the
// DB and there is only a single thread operating on DB
Expand Down
Loading

0 comments on commit a657ee9

Please sign in to comment.