Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: introduce compressedArrayElement for profile data #5012

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vvuk
Copy link

@vvuk vvuk commented May 29, 2024

Much of the profile json coming from something like samply is taken up by arrays containing an identical value. samply even has serialization helpers to create arrays of length N with just null, 0, etc. in order to match the profile file format. These values take up a lot of space in the final json; up to 30% in some profiles, and the values are completely unnecessary.

This draft PR shows one approach to how the format could be changed to allow for arrays that all contain the same value to be replaced with just the value itself. In other words:

  lineNumber: [null, null, null, .....]

becomes:

   lineNumber: null

I don't actually like this implementation, but opening this PR to get feedback. The problem is that the profiler front end code uses the JSON data format as the in-memory data format directly, so every single location that reads from one of these arrays would need to be replaced with a call to compressedArrayElement which is very error prone. I've only captured a few of these locations here.

I think a better approach would be to introduce a type that acts like an array, but contains a _value that's either an array or a single value and then as part of parsing the profile, run through the post-JSON.parse data structure and convert these, e.g.:

  functionTable.lineNumber = new CompressedArray(functionTable.lineNumber);

CompressedArray would also have a toJSON implementation to serialize the value-or-array.

@mstange
Copy link
Contributor

mstange commented May 29, 2024

If we want to change the in-memory representation, then I would prefer the CompressedArray approach, but it'll be more invasive.

If we mostly care about reducing the size of the string that we pass to JSON.parse, when we could introduce more differences between the "serializable" format and the in-memory format - this distinction already exists but at the moment it only affects the stringTable / stringArray:

/**
* The UniqueStringArray is a class, and is not serializable. This function turns
* a profile into the serializable variant.
*/
export function makeProfileSerializable({
threads,
...restOfProfile
}: Profile): SerializableProfile {
return {
...restOfProfile,
threads: threads.map(({ stringTable, ...restOfThread }) => {
return {
...restOfThread,
stringArray: stringTable.serializeToArray(),
};
}),
};
}
/**
* Take a processed profile and remove any non-serializable classes such as the
* StringTable class.
*/
export function serializeProfile(profile: Profile): string {
return JSON.stringify(makeProfileSerializable(profile));
}
/**
* Take a serialized processed profile from some saved source, and re-initialize
* any non-serializable classes.
*/
function _unserializeProfile({
threads,
...restOfProfile
}: SerializableProfile): Profile {
return {
...restOfProfile,
threads: threads.map(({ stringArray, ...restOfThread }) => {
return {
...restOfThread,
stringTable: new UniqueStringArray(stringArray),
};
}),
};
}

I think that would be a lot less invasive - we'd convert the "compressed" representation into the uncompressed representation in _unserializeProfile and none of the rest of the code would need to know about the compressed representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants