Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FilterX copy-on-write for mutable objects #330

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

MrAnno
Copy link
Member

@MrAnno MrAnno commented Oct 8, 2024

Depends on #346

Copy-on-write and FilterXRef

Deep-copy assignment (the .something = , ["something"] = , something =, something += operators) is an essential feature of the FilterX language, making it easy to trace and understand how values change.

However, copy (especially deep-copy) is expensive, so this PR aims to fix the performance bottleneck by introducing copy-on-write for mutable objects.

To achieve CoW semantics without introducing endless changes in the filterx folder, FilterXRef has been introduced.
References are currently not part of the FilterX language (hopefully, they never will be). FilterXRef is used only internally to reference the same FilterXObject from multiple locations (variables, other types of assignments).
For this reason, FilterXRef pretends to be a FilterXObject, but it's not a real FilterX type.

FilterXRef transparently propagates operations (virtual function calls) to the referenced value, but it also makes sure that the value is copied on the first write operation in case multiple references exist.

FilterXRef is created only for mutable objects, and only when they are needed: around assignment-like operators;
this and its FilterXObject behavior was the key to avoid polluting all code paths with references.

Implementation concerns

Unfortunately, FilterXRef can only be transparent when virtual calls are used.
We use a lot of downcasting and isinstanceof() calls for logical and performance reasons.
Where explicit downcasting is needed, filterx_ref_unwrap() must be called to extract the original object. The extracted object must only be used locally (short-term, without referencing or storing it anywhere).
This is ugly and hard to maintain, but downcasting is the original problem here. A safety check has been added to filterx_object_is_type() in order to detect if we forget to call unwrap().
(And since FilterXRef is only for mutable types, I could avoid polluting the codebase with those calls where immutable downcasts were used.)

Everything sounds simple in theory, but debugging this with GDB is a nightmare if you try to catch a bug.

Current limitations

Full CoW on subscript and attr operators (.something, ["something"]) of JSON objects is currently not implemented.
I've tried to make it work, but it requires a complete redesign of how we cache JSON subobjects (or alternatively, we need an internal representation for dicts and lists without using the json-c library). This will be implemented later.
For now, such operators make unnecessary copies (CoW only partially works on them).

fx

@MrAnno MrAnno force-pushed the filterx-cow branch 2 times, most recently from 427be8e to 4a0223c Compare October 14, 2024 00:25
@MrAnno MrAnno marked this pull request as ready for review October 14, 2024 00:25
@MrAnno
Copy link
Member Author

MrAnno commented Oct 20, 2024

I've written a longer PR description. The diagram may be hard to understand due to the lack of context, but we can talk about it IRL.

Deep-copy assignment (the `.something = `, `["something"] = `,
`something =`, `something +=` operators) is an essential feature of the
FilterX language, making it easy to trace and understand how values
change.

However, copy (especially deep-copy) is expensive, so this PR aims to
fix the performance bottleneck by introducing copy-on-write for mutable
objects.

To achieve CoW semantics without introducing endless changes in the
`filterx` folder, `FilterXRef` has been introduced. References are
currently not part of the FilterX language (hopefully, they never will
be). `FilterXRef` is used only internally to reference the same
`FilterXObject` from multiple locations (variables, other types of
assignments). For this reason, `FilterXRef` pretends to be a
`FilterXObject`, but it's not a real FilterX type.

`FilterXRef` transparently propagates operations (virtual function
calls) to the referenced value, but it also makes sure that the value is
copied on the first write operation in case multiple references exist.

`FilterXRef` is created only for mutable objects, and only when they are
needed: around assignment-like operators; this and its `FilterXObject`
behavior was the key to avoid polluting all code paths with references.

Signed-off-by: László Várady <[email protected]>
The function will be extended with FilterXRef-related functionality,
which requires a separate unit to avoid circular dependency between
FilterX objects and refs.

Signed-off-by: László Várady <[email protected]>
FilterXRef can mimic the functionality of its referenced value, but only
through virtual methods.

filterx_object_is_type() is usually followed by an implicit type cast,
which is impossible to implement transparently (without the unwrap() call),
so this safety check has been added. This is far from ideal, but doing
downcasts is the original and real problem here.

Signed-off-by: László Várady <[email protected]>
"lhs_object" is "self".

Signed-off-by: László Várady <[email protected]>
`FilterXRef` is created only for mutable objects, and only when they are
needed: around assignment-like operators; this and its `FilterXObject`
behavior is the key to avoid polluting all code paths with references.

Full CoW on subscript and attr operators (`.something`, `["something"]`)
of JSON objects is currently not implemented. I've tried to make it
work, but it requires a complete redesign of how we cache JSON
subobjects (or alternatively, we need an internal representation for
dicts and lists without using the json-c library). This will be
implemented later. For now, such operators make unnecessary copies (CoW
only partially works on them).

Signed-off-by: László Várady <[email protected]>
FilterXRef can mimic the functionality of its referenced value, but only
through virtual methods.

Where (implicit) downcasts are used, it is impossible to be transparent
about references, so these unwrap methods have to be used.
This is far from ideal, but doing downcasts based on is_type() is the
original and real problem here.

In such cases, `filterx_ref_unwrap()` must be called to extract the
original object. The extracted object must only be used locally
(short-term, without referencing or storing it anywhere).

Signed-off-by: László Várady <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant