-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve memory footprint of columnar deserialization using relevant filter #895
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
@@ -336,6 +339,21 @@ template <dataset_type_tag dataset_type_> class Dataset { | |||
add_component_info_impl(component, elements_per_scenario, total_elements); | |||
} | |||
|
|||
void enable_atrribute_indications(std::string_view component) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
void enable_atrribute_indications(std::string_view component) | |
void enable_attribute_indications(std::string_view component) |
@@ -152,6 +152,14 @@ struct DefaultNullVisitor : msgpack::null_visitor { | |||
} | |||
}; | |||
|
|||
struct NullVisitorCheckMap : DefaultNullVisitor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not really a null-visitor anymore, right?
struct NullVisitorCheckMap : DefaultNullVisitor { | |
struct CheckHasMap : DefaultNullVisitor { |
* @brief Return if a component has attribute indications. | ||
* | ||
* Attribute indications are used to indicate the presence of meaningful attributes for a certain component in the | ||
* dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's good to also mention the behavior if it has both preamble attribute indications and in-data attributes
|
Background
When using PGM deserialization, the user can select a relevant filter for columnar data format. The de-serializer will return only relevant attributes for that component.
Internally, the Python routine still make buffers of all attributes. Then the attributes with only
nan
are discarded. This is un-wanted memory footprint.Improvement
In general, we cannot know which attributes are relevant before parse the whole dataset. However, if the condense attribute list is given in the dataset header, and there are no map-like elements in the data, we know for sure that the relevant attributes can only be those attributes specified in the header. In this way, we can ask the Python routine to only create those buffers. This should save memory footprint of the de-serializer.
Implementation