Support for serialization of records with no default constructor #510
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
C# 9 added
record
, which formalises a pattern where a type has a "primary constructor" whose parameters names and types exactly match the names and types of its public properties, which are init-only. This implies a corresponding serialisation/deserialisation pattern, where serialisation reads the public properties and deserialisation calls the primary constructor.This PR extends Parquet.NET to support serialising/deserialising records, as well as ordinary classes that have no default constructor and follow the same pattern as records in the naming of constructor parameters.
Altering the Parquet.NET serialisation code to support this pattern directly is next to impossible, because it necessarily reads whole columns of a row-group and updates the corresponding properties of objects that have already been allocated, i.e. it needs objects with write-enabled properties.
But if each record type
R
had a corresponding placeholder typeP
that had the necessary properties, deserialisation could perform a first pass that constructs a set ofP
, and then a second pass that constructs eachR
from aP
. The drawback of this is that it implies a lot of additional allocation for large row-groups.But there is a solution that avoids this: we can use
R
itself as the placeholder type. The CLR provides a way to allocate an instance of a type without yet calling its constructor (call this a "pre-constructed" object). Its fields/properties will have default values, exactly as they do at the start of the constructor. So a pre-constructedR
can be safely serialised into.The second pass executes the constructors on all the
R
types in the row-group, passing the property values into the primary constructor. AConstructorInfo
can be called via reflection exactly like an instance method, running the constructor on a pre-constructed object, so no new object is allocated. The parameters are re-assigned to the properties that already have those values, which is unavoidable, but necessary because the primary constructor of a record can contain additional user-defined code to initialise fields from the parameter values (see tests in this PR).To make this fast, code-generation can be used, as it is in existing Parquet.NET serialisation. The
Expression
-based approach has a limitation: it can't invoke a constructor on a pre-constructed object. But IL-generation (like reflection) has no such limitation. So a "post-constructor" operation can be generated for a type.If a type has a default (no-params) constructor, that constructor continues to be used and the type does not require post-construction.
Even so, a type may contain nested type references within it (e.g. a property that is a list of records), and this case must also be handled by generated code that visits records nested within the hierarchy.
If the type's full hierarchy does not contain any types requiring post-construction, the post-constructor operation is generated as a no-op. This should be the case for all existing client code of Parquet.NET.
Until now serialisation methods have constraint the type with
T : new()
. This restriction is removed in this PR, but a suitably relaxed check is performed at runtime.Note that recursive types (e.g. tree of nodes) cannot be serialised, but the code-gen could be enhanced to allow this if required.