-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
materialize-bigquery: don't serialize JSON nulls #2323
Conversation
Serializing an explicit JSON `null` in staged files causes the loaded value from those files to be a JSON `null` instead of a SQL `null`. We want these to be SQL `null` instead, and omitting the value from the staged files will do that. It also saves us a little bit of work when there are lots of fields that are `null`. Closes #2126
Adds a feature flag for materializing objects and arrays as strings. This is the current behavior, and the feature flag is being added so that all pre-existing tasks can have it be set to `true`. A subsequent change will enable the flag's behavior, where if it is not set (which will be the default going forward), objects and arrays will instead be materialized into JSON columns.
f43309e
to
ebe3547
Compare
} | ||
|
||
type advancedConfig struct { | ||
FeatureFlags string `json:"feature_flags,omitempty" jsonschema:"title=Feature Flags,description=This property is intended for Estuary internal use. You should only modify this field as directed by Estuary support."` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not an advanced property specifically for materializing objects and arrays as strings without an exposed JSONSchema? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this same thought initially, and there was some internal discussion about it here.
The TL;DR is that there's a desire to standardize around using this feature flag convention for this sort of thing, as we have been & will be doing more of for changing behaviors of SQL captures. More specifically what I mean re: "this sort of thing" is controlling behaviors that we don't expect users to ever interact with directly, either for preserving existing compatibility, or selectively enabling new code paths.
I changed the feature flag from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -16,7 +17,10 @@ import ( | |||
"google.golang.org/api/option" | |||
) | |||
|
|||
// Config represents the endpoint configuration for BigQuery. | |||
var featureFlagDefaults = map[string]bool{ | |||
"objects_and_arrays_as_json": false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is fairly self-explanatory in this case, but I've been trying to maintain a convention of describing the impact of a flag here in the defaults map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, I added a comment here to explain what the flag does.
…re flag Changes the feature flag `objects_and_arrays_as_string` to `objects_and_arrays_as_json`, to allow for a flow of setting existing tasks to `no_objects_and_arrays_as_json`, and then eventually making the JSON version the default.
1227f7a
to
99c15b6
Compare
Description:
The current system of serializing JSON
null
values explicitly results in those values being loaded to BigQuery tables as JSONnull
rather than an actual SQLnull
. The intention is for these values to be SQLnull
, and that can be accomplished by omitting the fields entirely from the serialized documents.Also included in this PR is the addition of a feature flag
objects_and_arrays_as_strings
which will be used in a later change (see #2324) that will begin to create object and array columns as JSON columns, rather than their string representations in string columns. Right now the feature flag doesn't do anything, but the configuration structure is being added so that pre-existing tasks can have it set to maintain their existing behavior, and have newly created tasks not have it set and use JSON columns instead.Closes #2126
Workflow steps:
(How does one use this feature, and how has it changed)
Documentation links affected:
(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)
Notes for reviewers:
(anything that might help someone review this PR)
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)