-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
materialize-bigquery: JSON column type for array and objects #2324
base: main
Are you sure you want to change the base?
Conversation
When the BigQuery materialization connector was initially built, BigQuery did not have a native JSON column type, so arrays and objects were materialized as "stringified" JSON. There is now a native JSON column type, so we can use that for these kinds of fields. The main complication here is preserving the behavior of pre-existing materializations, while allowing new ones to use the JSON column type. The `objects_and_arrays_as_json` feature flag was added previously, and that flag will toggle using the old stringified columns instead of the JSON columns. Most of the diff here is mechanical changes needed to thread through the value of that `objects_and_arrays_as_json` flag to where it needs to be.
f0c7a35
to
ba0bfb1
Compare
@@ -32,77 +35,86 @@ type transactor struct { | |||
loggedStorageApiMessage bool | |||
} | |||
|
|||
func newTransactor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this diff is just from wrapping the newTransactor
behavior in a closure so that it has access to the previously computed dialect, templates, and flag value.
There are no currently known cases where this should be allowed, and it is likely to break situations where the document is currently materialized as a string column and attempts to be migrated to a JSON column, where string -> JSON migrations results in a JSON string rather than an actual JSON object.
2e75b1e
to
be372c4
Compare
@williamhbaker regarding release process, why do we not set the flag on tasks now, and then release this with |
That's a good point, since I did this a little strangely and already added the flag in a prior PR we can do it like you suggest, and not have to have another follow-up. I'll update the PR and release process description for that! |
…fault New materializations created without the "no_objects_and_arrays_as_json" will now use JSON columns for objects and arrays.
@mdibaiee I finished updating this PR to have |
Description:
When the BigQuery materialization connector was initially built, BigQuery did not have a native JSON column type, so arrays and objects were materialized as "stringified" JSON.
There is now a native JSON column type, so we can use that for these kinds of fields.
The main complication here is preserving the behavior of pre-existing materializations, while allowing new ones to use the JSON column type. The
objects_and_arrays_as_json
feature flag was added previously, and that flag will toggle using the old stringified columns instead of the JSON columns.Most of the diff here is mechanical changes needed to thread through the value of that
objects_and_arrays_as_json
flag to where it needs to be.The rollout of this change will be:
no_objects_and_arrays_as_json
, which was added in materialize-bigquery: don't serialize JSON nulls #2323.Closes #2319
Closes #2333
Workflow steps:
(How does one use this feature, and how has it changed)
Documentation links affected:
(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)
Notes for reviewers:
Manually tested that an existing materialization can transition successfully from having no feature flag set to having the
no_objects_and_arrays_as_json
set. Trying to setobjects_and_arrays_as_json
on an existing materialization fails because of the newly-added compatibility check for existing root document columns, and the change to not allow them to be migrated. A brand new task withobjects_and_arrays_as_json
set works fine with JSON columns from the start.This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)