Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several bug fixes for serialization or nested/repeated messages #387

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jasonmreding
Copy link
Collaborator

Description

This PR fixes several issues with message serialization when EfficientMessageCopy = TRUE.
#378
#384

Implementation

  • Fixed several places where the array being allocated was larger than required. In effect, we were requesting an allocation for the total byte size when NumericArrayResize takes the element size and not the byte size.
  • Fixed a few places where arrays were always allocating 8 byte pointers rather than 4 bytes for a pointer on 32 bit platforms.
  • Cleaned up some state management data that wasn't really needed since the lifetime of the data was effectively managed by the stack frame.
  • Fixed issues where the buffer used to copy data for repeated nested messages was always copying previous elements in the array and copying the address for the same buffer multiple times in the message structure. This leads to DWARNs in the memory manager since we attempt to deallocate the same block of memory multiple times. The general idea is that we allocate a buffer once that is reused, and the size of the buffer is doubled on demand as needed when copying data. Each repeated message element then starts copying data back at element zero rather than starting from the index in the buffer where the previous element left off.

Testing

I manually tested the proto file attached to #378 failed before the changes and works as expected afterwards. I also verified the proto API referenced from #384 works as expected and no longer crashes using either 32 or 64 bit LV on Windows.

@@ -287,7 +287,7 @@ namespace grpc_labview {
auto repeatedString = static_cast<const LVRepeatedMessageValue<std::string>&>(*value);
if (repeatedString._value.size() != 0)
{
NumericArrayResize(0x08, 1, start, repeatedString._value.size());
NumericArrayResize(GetTypeCodeForSize(sizeof(char*)), 1, start, repeatedString._value.size());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: An 8 byte pointer is too large for 32 bit LV.


NumericArrayResize(0x08, 1, start, repeatedNested->_value.size() * clusterSize);
NumericArrayResize(GetTypeCodeForSize(alignment), 1, start, alignedElementSize);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The previous allocation was 8x larger than it needed to be since the element size passed was the byte size, and we were specifying an element size of 8 bytes. We now pick an element size that meets our alignment requirements and update the number of elements to allocate accordingly.

}

{
auto repeatedString = google::protobuf::RepeatedField<std::string>();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: There is no reason to hold onto this repeated string field beyond this stack frame. I've removed the backing fields from the message that were being used to track it and just allocate everything on the stack now for efficiency/clarity.

@@ -94,20 +86,21 @@ namespace grpc_labview
} while (ExpectTag(tag, protobuf_ptr));


NumericArrayResize(0x08, 1, reinterpret_cast<void*>(const_cast<char*>(lv_ptr)), _repeatedStringValuesIt->second.size());
NumericArrayResize(GetTypeCodeForSize(sizeof(char*)), 1, reinterpret_cast<void*>(const_cast<char*>(lv_ptr)), repeatedString.size());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: A pointer on 32 bit platforms is not 8 bytes.

auto arraySize = numElements * clusterSize;
char _fillData = '\0';

// Get the _repeatedMessageValues vector from the map
auto _repeatedMessageValuesIt = _repeatedMessageValuesMap.find(metadata->messageName);
auto _repeatedMessageValuesIt = _repeatedMessageValuesMap.find(fieldInfo.fieldName);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: This effectively fixes #378 where a message has two repeated fields of the same message type. Since we were tracking the data via the message name rather than the field name, the second field was adding its data to the end of the first field.

else
{
// Zero out data from the previous use of the buffer.
auto bytesUsed = _repeatedMessageValuesIt->second.get()->_numElements * clusterSize;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: Rather than using _continueFromIndex to begin writing from where the previous element in the array left off, we now always reset the index back to zero. For that to work, we need to explicitly zero out any data from the previous element. Otherwise, if some fields weren't serialized because they were set to their default values, we can end up reusing the data from the previous element which leads to data corruption. I thought about just allocating a new buffer each time to simplify things, but I decided the performance gains of not repeatedly allocation a buffer were probably worth it.

The problem with the previous approach is that while we were effectively tracking the end index of the buffer, we weren't also tracking the beginning index. So when we finally copy the data in PostInteralParseAction, we always copy from index zero which includes data from all of the elements that preceded it. Also, the previous approach allocated ever growing memory rather than reuse the memory from previous allocations once it had been copied.

@@ -204,27 +193,38 @@ namespace grpc_labview
{
for (auto nestedMessage : _repeatedMessageValuesMap)
{
if (!nestedMessage.second.get()->_validData)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: This check ensures we only copy the data into a LV data structure once. In particular, it handles cases where the message element at index zero in the array had a non-default value for a repeated message field but the message element at index one had an empty array. In that case, we don't want to copy the array from the previous element again.


auto metadata = fieldInfo._owner->FindMetadata(fieldInfo.embeddedMessageName);
const char* lv_ptr = (this->GetLVClusterHandleSharedPtr()) + fieldInfo.clusterOffset;
auto clusterSize = metadata->clusterSize;
auto alignment = metadata->alignmentRequirement;


// shrink the array to the correct size
auto arraySize = numElements * clusterSize;
NumericArrayResize(0x08, 1, reinterpret_cast<void*>(const_cast<char*>(lv_ptr)), arraySize);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: Again, this allocation is 8x larger than it needs to be.

@@ -86,13 +84,14 @@ namespace grpc_labview
// copy into LVCluster
if (numElements != 0)
{
NumericArrayResize(0x08, 1, reinterpret_cast<void*>(const_cast<char*>(_lv_ptr)), numElements);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: This was over allocating for any element type that wasn't 8 bytes (so basically anything other than doubles).

@@ -262,18 +262,19 @@ LIBRARY_EXPORT int LVGetServices(grpc_labview::LVProtoParser* parser, grpc_labvi
}

auto count = parser->m_FileDescriptor->service_count();
if (NumericArrayResize(0x08, 1, services, count * sizeof(ServiceDescriptor*)) != 0)
auto elementSize = sizeof(ServiceDescriptor*);
if (NumericArrayResize(grpc_labview::GetTypeCodeForSize(elementSize), 1, services, count * elementSize) != 0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: All of the changes to this file are just white space changes other than the five calls to NumericArrayResize in this file. They now all calculate the pointer size rather than always allocating 8 bytes for a pointer. I'm less certain about these changes since I'm a little confused as to why we are storing pointers to non LV data structures via array handles allocated by the LV memory manager. I'm also not certain who is calling these exported functions. The APIs appear to be related to grpc server reflection, but I don't know where/how they are being called.

@dixonjoel
Copy link
Collaborator

dixonjoel commented Oct 3, 2024

I noticed these warnings in the PR build. Probably pre-dates your changes, but I wonder if they're indicating problems or can be ignored.

D:\a\grpc-labview\grpc-labview\third_party\grpc\third_party\upb\upb/table_internal.h(200,28): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?) [D:\a\grpc-labview\grpc-labview\build\grpc\upb.vcxproj]
D:\a\grpc-labview\grpc-labview\third_party\grpc\third_party\upb\upb/msg_internal.h(469,1): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?) [D:\a\grpc-labview\grpc-labview\build\grpc\upb.vcxproj]
D:\a\grpc-labview\grpc-labview\third_party\grpc\third_party\upb\upb/msg_internal.h(741,34): warning C4090: 'function': different 'const' qualifiers [D:\a\grpc-labview\grpc-labview\build\grpc\upb.vcxproj]
D:\a\grpc-labview\grpc-labview\third_party\grpc\third_party\upb\upb\decode.c(190,61): warning C4646: function declared with 'noreturn' has non-void return type [D:\a\grpc-labview\grpc-labview\build\grpc\upb.vcxproj]
D:\a\grpc-labview\grpc-labview\third_party\grpc\third_party\upb\upb\decode.c(412,31): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?) [D:\a\grpc-labview\grpc-labview\build\grpc\upb.vcxproj]
  gzclose.c
  absl_random_internal_platform.vcxproj -> D:\a\grpc-labview\grpc-labview\build\grpc\third_party\abseil-cpp\absl\random\Release\absl_random_internal_platform.lib
  def.c

Looking at them more closely, probably harmless since it's 32-bit converted to 64-bit implicitly.

@bkeryan
Copy link
Collaborator

bkeryan commented Oct 3, 2024

I noticed these warnings in the PR build. Probably pre-dates your changes, but I wonder if they're indicating problems or can be ignored.

D:\a\grpc-labview\grpc-labview\third_party\grpc\third_party\upb\upb/table_internal.h(200,28): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?) [D:\a\grpc-labview\grpc-labview\build\grpc\upb.vcxproj]
D:\a\grpc-labview\grpc-labview\third_party\grpc\third_party\upb\upb/msg_internal.h(469,1): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?) [D:\a\grpc-labview\grpc-labview\build\grpc\upb.vcxproj]
D:\a\grpc-labview\grpc-labview\third_party\grpc\third_party\upb\upb/msg_internal.h(741,34): warning C4090: 'function': different 'const' qualifiers [D:\a\grpc-labview\grpc-labview\build\grpc\upb.vcxproj]
D:\a\grpc-labview\grpc-labview\third_party\grpc\third_party\upb\upb\decode.c(190,61): warning C4646: function declared with 'noreturn' has non-void return type [D:\a\grpc-labview\grpc-labview\build\grpc\upb.vcxproj]
D:\a\grpc-labview\grpc-labview\third_party\grpc\third_party\upb\upb\decode.c(412,31): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?) [D:\a\grpc-labview\grpc-labview\build\grpc\upb.vcxproj]
  gzclose.c
  absl_random_internal_platform.vcxproj -> D:\a\grpc-labview\grpc-labview\build\grpc\third_party\abseil-cpp\absl\random\Release\absl_random_internal_platform.lib
  def.c

Looking at them more closely, probably harmless since it's 32-bit converted to 64-bit implicitly.

These are all coming from grpc source, which is 2 years out of date. They may have been fixed upstream since then.

It looks like @ni-sujain tried updating to grpc 1.62 in 04435f7 then reverted it in 0f954d4 .

@dixonjoel dixonjoel requested a review from bkeryan October 4, 2024 15:18
@dixonjoel
Copy link
Collaborator

@bkeryan @jasonmreding I don't think we can afford to wait much longer for a review. Can Brad just complete his review and then we can just move forward?

@bkeryan bkeryan removed their request for review October 4, 2024 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants