-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CBG-3909: use deltas for pv and mv when persisting to the bucket #7096
Conversation
if err != nil { | ||
return 0, err | ||
} | ||
res := binary.LittleEndian.Uint64(decoded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't binary.LittleEndian.Uint64 expect a length 8 []byte as input? Here it looks like we're setting decoded to MaxDecodedLength (18) - can you help me understand the case where casByte[2:] is longer than 8 bytes but we need to support this and are intentionally ignoring trailing bytes?
I thought this was just handling the case where it's less than 8 bytes because we've trimmed trailing zeros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I don't think the max decoded length line is needed, I have removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, hex.Decode is going to panic if casByte[2:] overflows decoded, so it's possible the previous code was defensive. https://go.dev/play/p/_LEScaOT1FG
Our HexCasToUint64 has an explicit length check to protect against this. Possibly we just want to do the same (but also supporting the case here that the length can be less than 8): doing something like return error if len(casByte) > 10
at the top of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some safety checks around this area, let me know what you think
if err != nil { | ||
return 0, err | ||
} | ||
res := binary.LittleEndian.Uint64(decoded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, hex.Decode is going to panic if casByte[2:] overflows decoded, so it's possible the previous code was defensive. https://go.dev/play/p/_LEScaOT1FG
Our HexCasToUint64 has an explicit length check to protect against this. Possibly we just want to do the same (but also supporting the case here that the length can be less than 8): doing something like return error if len(casByte) > 10
at the top of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a suggestion around implementation - have a look and let me know what you think.
Signed-off-by: Gregory Newman-Smith <[email protected]>
base/util.go
Outdated
evenHexLen := make([]byte, len(casByte), len(casByte)+1) | ||
copy(evenHexLen, casByte) | ||
evenHexLen = append(evenHexLen, '0') | ||
casByte = evenHexLen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given than len(casByte) is odd, I think that casByte=append(casByte, '0')
is not going to require golang to increase the capacity of the slice (based on default slice capacities). If that's correct I think we can avoid the GC work of the new allocation you've got here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running all my tests, capacity for input to this function is always the same as the length. But the only test I run that crosses into this path of execution is the first call to HexCasToUint64ForDelta
inside test TestUint64CASToLittleEndianHexAndStripZeros
for a zero cas value. Removing this if statement makes the test fail with error encoding/hex: odd length hex string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't suggesting to remove the if - just to replace
evenHexLen := make([]byte, len(casByte), len(casByte)+1)
copy(evenHexLen, casByte)
evenHexLen = append(evenHexLen, '0')
casByte = evenHexLen
with
casByte = append(casByte, '0')
as that shouldn't trigger any new allocations when casByte starts out with an odd length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh apologies, completely got wrong end of the stick sorry. Have updated to make this change.
func TestUint64CASToLittleEndianHexAndStripZeros(t *testing.T) { | ||
hexLE := "0x0000000000000000" | ||
u64 := HexCasToUint64(hexLE) | ||
hexLEStripped := Uint64ToLittleEndianHexAndStripZeros(u64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running this test locally, the value of hexLEStripped is "000". Is this what we intend?
It looks like any hex value with only two non-zero digits retains an extra zero. (e.g. 0xa500000000000000
results in a50
when I would have expected a5
. Is there a requirement for a minimum length of three?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, fixed this so now that delta is just 0, also added new test case for this
db/hybrid_logical_vector.go
Outdated
|
||
func (vde VersionsDeltas) Less(i, j int) bool { | ||
if vde[i].Value == vde[j].Value { | ||
return vde[i].SourceID < vde[j].SourceID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the purpose of computing deltas, does the ordering of sources with matching values matter? i.e. do we actually need SourceID to be acting as a tiebreaker, or can we just stick to vde[i].Value < vde[j].Value
below and have equality return Less() = false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we have unit test coverage for matching Values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did this to align with xdcr approach on it. I have removed it because the more I think about it we don't need to align with them on this part. I have returned false in that scenario.
Added a new test case to TestVersionDeltaCalculation
for pv map with two entries that have the same value, leading to 0 deltas being computed.
for src, vrs := range versions { | ||
vdm = append(vdm, CreateVersion(src, vrs)) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return early here when len(versions) == 1? And do we have test coverage for that scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, yes we have a scenario where we have single entry in PV in TestPvDeltaReadAndWrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good except for one clarification on my previous comment (reducing allocations when converting casByte to an even length)
base/util.go
Outdated
evenHexLen := make([]byte, len(casByte), len(casByte)+1) | ||
copy(evenHexLen, casByte) | ||
evenHexLen = append(evenHexLen, '0') | ||
casByte = evenHexLen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't suggesting to remove the if - just to replace
evenHexLen := make([]byte, len(casByte), len(casByte)+1)
copy(evenHexLen, casByte)
evenHexLen = append(evenHexLen, '0')
casByte = evenHexLen
with
casByte = append(casByte, '0')
as that shouldn't trigger any new allocations when casByte starts out with an odd length.
CBG-3909
Pre-review checklist
fmt.Print
,log.Print
, ...)base.UD(docID)
,base.MD(dbName)
)docs/api
Integration Tests
GSI=true,xattrs=true
https://jenkins.sgwdev.com/job/SyncGateway-Integration/2739/