Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBG-3909: use deltas for pv and mv when persisting to the bucket #7096

Merged
merged 19 commits into from
Oct 11, 2024

Conversation

gregns1
Copy link
Contributor

@gregns1 gregns1 commented Aug 29, 2024

CBG-3909

  • Adds functionality to have previous versions and merge version in the persisted hlv stored as deltas.
  • Added functionality to read the persisted form of hlv and convert the pv and mv lists back to full values from deltas and back into map form in memory
  • Functionality to take in memory hlv and convert the pv and mv maps into delta lists and persist to the bucket in that form
  • Currently I have left the fact we have to convert between cas string and uin64 cas a bit back and forth for converting deltas for work in separate ticket (if we decide to change the in memory format back to uint674 for cas)

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

@gregns1 gregns1 assigned gregns1 and adamcfraser and unassigned gregns1 Aug 29, 2024
@gregns1 gregns1 assigned gregns1 and adamcfraser and unassigned adamcfraser and gregns1 Sep 4, 2024
base/util.go Outdated Show resolved Hide resolved
if err != nil {
return 0, err
}
res := binary.LittleEndian.Uint64(decoded)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't binary.LittleEndian.Uint64 expect a length 8 []byte as input? Here it looks like we're setting decoded to MaxDecodedLength (18) - can you help me understand the case where casByte[2:] is longer than 8 bytes but we need to support this and are intentionally ignoring trailing bytes?

I thought this was just handling the case where it's less than 8 bytes because we've trimmed trailing zeros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I don't think the max decoded length line is needed, I have removed it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, hex.Decode is going to panic if casByte[2:] overflows decoded, so it's possible the previous code was defensive. https://go.dev/play/p/_LEScaOT1FG
Our HexCasToUint64 has an explicit length check to protect against this. Possibly we just want to do the same (but also supporting the case here that the length can be less than 8): doing something like return error if len(casByte) > 10 at the top of the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some safety checks around this area, let me know what you think

base/util.go Outdated Show resolved Hide resolved
db/document.go Outdated Show resolved Hide resolved
base/util_test.go Outdated Show resolved Hide resolved
db/hybrid_logical_vector.go Outdated Show resolved Hide resolved
if err != nil {
return 0, err
}
res := binary.LittleEndian.Uint64(decoded)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, hex.Decode is going to panic if casByte[2:] overflows decoded, so it's possible the previous code was defensive. https://go.dev/play/p/_LEScaOT1FG
Our HexCasToUint64 has an explicit length check to protect against this. Possibly we just want to do the same (but also supporting the case here that the length can be less than 8): doing something like return error if len(casByte) > 10 at the top of the function?

Copy link
Collaborator

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a suggestion around implementation - have a look and let me know what you think.

db/hybrid_logical_vector.go Outdated Show resolved Hide resolved
db/hybrid_logical_vector.go Outdated Show resolved Hide resolved
base/util.go Outdated
Comment on lines 1030 to 1033
evenHexLen := make([]byte, len(casByte), len(casByte)+1)
copy(evenHexLen, casByte)
evenHexLen = append(evenHexLen, '0')
casByte = evenHexLen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given than len(casByte) is odd, I think that casByte=append(casByte, '0') is not going to require golang to increase the capacity of the slice (based on default slice capacities). If that's correct I think we can avoid the GC work of the new allocation you've got here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running all my tests, capacity for input to this function is always the same as the length. But the only test I run that crosses into this path of execution is the first call to HexCasToUint64ForDelta inside test TestUint64CASToLittleEndianHexAndStripZeros for a zero cas value. Removing this if statement makes the test fail with error encoding/hex: odd length hex string

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting to remove the if - just to replace

evenHexLen := make([]byte, len(casByte), len(casByte)+1)
copy(evenHexLen, casByte)
evenHexLen = append(evenHexLen, '0')
casByte = evenHexLen

with

casByte = append(casByte, '0')

as that shouldn't trigger any new allocations when casByte starts out with an odd length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh apologies, completely got wrong end of the stick sorry. Have updated to make this change.

func TestUint64CASToLittleEndianHexAndStripZeros(t *testing.T) {
hexLE := "0x0000000000000000"
u64 := HexCasToUint64(hexLE)
hexLEStripped := Uint64ToLittleEndianHexAndStripZeros(u64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this test locally, the value of hexLEStripped is "000". Is this what we intend?
It looks like any hex value with only two non-zero digits retains an extra zero. (e.g. 0xa500000000000000 results in a50 when I would have expected a5. Is there a requirement for a minimum length of three?

Copy link
Contributor Author

@gregns1 gregns1 Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, fixed this so now that delta is just 0, also added new test case for this


func (vde VersionsDeltas) Less(i, j int) bool {
if vde[i].Value == vde[j].Value {
return vde[i].SourceID < vde[j].SourceID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purpose of computing deltas, does the ordering of sources with matching values matter? i.e. do we actually need SourceID to be acting as a tiebreaker, or can we just stick to vde[i].Value < vde[j].Value below and have equality return Less() = false?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we have unit test coverage for matching Values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did this to align with xdcr approach on it. I have removed it because the more I think about it we don't need to align with them on this part. I have returned false in that scenario.

Added a new test case to TestVersionDeltaCalculation for pv map with two entries that have the same value, leading to 0 deltas being computed.

for src, vrs := range versions {
vdm = append(vdm, CreateVersion(src, vrs))
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return early here when len(versions) == 1? And do we have test coverage for that scenario?

Copy link
Contributor Author

@gregns1 gregns1 Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, yes we have a scenario where we have single entry in PV in TestPvDeltaReadAndWrite

db/hybrid_logical_vector.go Outdated Show resolved Hide resolved
db/hybrid_logical_vector.go Outdated Show resolved Hide resolved
@adamcfraser adamcfraser assigned gregns1 and unassigned adamcfraser Oct 7, 2024
@gregns1 gregns1 assigned adamcfraser and unassigned gregns1 Oct 8, 2024
Copy link
Collaborator

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good except for one clarification on my previous comment (reducing allocations when converting casByte to an even length)

base/util.go Outdated
Comment on lines 1030 to 1033
evenHexLen := make([]byte, len(casByte), len(casByte)+1)
copy(evenHexLen, casByte)
evenHexLen = append(evenHexLen, '0')
casByte = evenHexLen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting to remove the if - just to replace

evenHexLen := make([]byte, len(casByte), len(casByte)+1)
copy(evenHexLen, casByte)
evenHexLen = append(evenHexLen, '0')
casByte = evenHexLen

with

casByte = append(casByte, '0')

as that shouldn't trigger any new allocations when casByte starts out with an odd length.

@adamcfraser adamcfraser assigned gregns1 and unassigned adamcfraser Oct 9, 2024
@gregns1 gregns1 assigned adamcfraser and unassigned gregns1 Oct 10, 2024
@adamcfraser adamcfraser merged commit 7ad9dbf into release/anemone Oct 11, 2024
36 checks passed
@adamcfraser adamcfraser deleted the CBG-3909 branch October 11, 2024 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants