Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer delta application supports re-sharding for Object type #644

Merged
merged 6 commits into from
Nov 2, 2023

Conversation

Sunjeet
Copy link
Contributor

@Sunjeet Sunjeet commented Oct 11, 2023

This is 2 of 4 planned PRs for supporting dynamic type re-sharding:

  • PR#642 utilities for splitting/joining Object types; limited to Object types in this step, extend to Map/Set/List in last step
  • consumer delta application reshards with O(shard size) extra space (this PR)
  • producer-side numShards toggle and signaling
  • Extend resharding to Map,List, and Set types

Benchmarking with jmh

Benchmark 1: No significant regression for consumers when resharding isn't invoked, avg is upto 6% worse [Baseline.json vs. PR.json]

Benchmark 2: Read performance during delta transition with resharding vs. without resharding: avg is upto 6% worse again [PR_withDeltaTransitionsNoResharding.json vs. PR_withDeltaTransitionsWithResharding.json]

Baseline.json
PR.json
PR_withDeltaTransitionsNoResharding.json
PR_withDeltaTransitionsWithResharding.json

int shardOrdinal = ordinal >> shardOrdinalShift;
copyRecord(historicalDataElements, nextOrdinal, stateEngineDataElements[shard], shardOrdinal, currentWriteVarLengthDataPointers);
int whichShard = ordinal & shardsHolder.shardNumberMask;
int shardOrdinal = ordinal >> shardsHolder.shards[whichShard].shardOrdinalShift;
Copy link
Contributor

@dkoszewnik dkoszewnik Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this holder.shards[x].shardOrdinalShift used in a couple of places, I don't think the shard ordinal shift can actually be different between shards -- am I correct?

If so, why isn't the shardOrdinalShift a property of the holder, rather than the shard itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shardOrdinalShift can be different for different shards when we're in the middle of resharding.

Consider the case where we're resharding from 2 shards [S0, S1] to 4 shards .

In the first step of resharding, we will expand the shard sharray but retain the original shards as is to get-
[S0, S1, S0, S1]

In the next step of resharding, we process one shard at a time, so processing S0 yields-
[S00, S1, S01, S1]
Here the dataElements and shardOrdinalShift for the processed shard are different than the original shard but they co exist in the shard holder with shards that havent been processed yet (S1).

Ultimately when we've finished resharding we arrive at
[S00, S10, S01, S11]
and the shardOrdinalShifts are once again equal for all shards.

Comment on lines +129 to +131
historicalDataElements.bitsPerField[i] = Arrays.stream(shardsHolder.shards)
.map(shard -> shard.dataElements.bitsPerField[fieldIdx])
.max(Integer::compare).get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

final HollowObjectTypeReadStateShard[] shards = this.shardsVolatile.shards;
return Arrays.stream(shards)
.map(shard -> shard.dataElements)
.toArray(HollowObjectTypeDataElements[]::new);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stream approach feels slightly harder to read, that might be my own bias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, reworked it

default:
return fixedLengthValue == currentData.nullValueForField[fieldIndex];
}
public long isNull(int ordinal, int fieldIndex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this method has the wrong name now -- it is now just reading the actual field data whether or not it is beyond the max bits for a single (possibly unaligned) read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, renamed to readValue

@Sunjeet Sunjeet merged commit ec5dcb4 into master Nov 2, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants