Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom interface serializers #104

Merged
merged 3 commits into from
Nov 9, 2023
Merged

Custom interface serializers #104

merged 3 commits into from
Nov 9, 2023

Conversation

chriso
Copy link
Contributor

@chriso chriso commented Nov 9, 2023

This PR changes the way custom serializers work. If you register serialization routines for type T where T is an interface type, the serialization routines will now be used for values of type T and every type that implements T.

The serialization layer will no longer recursively scan values of type T when a custom serializer has been registered for T or for an interface that T implements. The assumption is that the custom serializers will not serialize the underlying memory regions in a way that makes preserving pointers outside T that alias those regions possible, hence scanning is not necessary.

These two changes combined allow users to avoid scanning and serializing complex types that the serialization layer cannot currently handle. One example from https://github.com/protocolbuffers/protobuf-go is the proto.Message interface and the generated structs that implement it, which use all sorts of unsafe hacks (e.g. to prevent copying and comparisons, and to embed information used for Go and protobuf reflection). The protobuf library provides a native way to serialize any sort of proto.Message (via anypb). Users can now register custom serialization routines for proto.Message and don't have to manually register routines for every type that implements proto.Message.

@chriso chriso merged commit 7cf7aca into main Nov 9, 2023
2 checks passed
@chriso chriso deleted the serde-register-interface branch November 9, 2023 22:04
chriso added a commit that referenced this pull request Nov 21, 2023
Now that we can inspect durable coroutine state (#113), a few
optimizations are clear.

Firstly, package names are currently duplicated across types. This PR
updates the serialization layer to intern strings so that there's no
duplication.

Secondly, the serialization layer currently stores type information for
_all_ referenced types (recursively). When custom serializers are
registered this type information is not used; we just need a reference
to the custom serialization routines. An offline process analyzing the
state doesn't need this information either, because the custom
serializer might emit entirely different objects/types. Since
#112, type information is
available for the _output_ of custom serializers, so the input type info
isn't necessary. This PR updates the serialization layer to no longer
store type information for types with custom serializers. All that's
stored now is the interned package name and type name, along with the
`custom` flag which indicate that the type is opaque. When stacked with
#104, the reduction in
the size of type information can be drastic.
chriso added a commit that referenced this pull request Nov 21, 2023
This fixes bugs introduced by
24b7af8.

Since #114, we no longer store type information for types with custom
serializers registered. We allow custom serializers to be registered for
interface types, and since #104 these custom routines are used for all
implementations of those interfaces.

We need to store enough information to be able to recreate the original
type in the program that generated the state. Since no stable identifier
is available for types, we currently use the offset to a known anchor in
the program (see
[1](https://github.com/stealthrocket/coroutine/blob/00b9eb66a6472279b8dc9a4994b2694ff3f06c17/types/unsafe.go#L66-L84)
and
[2](https://github.com/stealthrocket/coroutine/blob/00b9eb66a6472279b8dc9a4994b2694ff3f06c17/types/types.go#L57-L59)).
The first bug introduced by #114 is that it wasn't continuing to use
this facility when a custom serializer had been registered, so the
serialization layer was erroneously returning the interface type when
deserializing all implementations of those interface types. The second
bug was more subtle; when `*A` implements the interface `T` but `A` is
the named type for which an offset is available, we need to retain this
information so that we can correctly create `*A` rather than `T` when
deserializing objects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants