materialize-bigquery: JSON column type for array and objects #2324

williamhbaker · 2025-02-03T19:05:19Z

Description:

When the BigQuery materialization connector was initially built, BigQuery did not have a native JSON column type, so arrays and objects were materialized as "stringified" JSON.

There is now a native JSON column type, so we can use that for these kinds of fields.

The main complication here is preserving the behavior of pre-existing materializations, while allowing new ones to use the JSON column type. The objects_and_arrays_as_json feature flag was added previously, and that flag will toggle using the old stringified columns instead of the JSON columns.

Most of the diff here is mechanical changes needed to thread through the value of that objects_and_arrays_as_json flag to where it needs to be.

The rollout of this change will be:

Set all existing bigquery materializations to have the flag no_objects_and_arrays_as_json, which was added in materialize-bigquery: don't serialize JSON nulls #2323.
Shortly after that, this pull request should be merged so that new tasks use the default behavior going forward of using JSON columns.

Closes #2319
Closes #2333

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Manually tested that an existing materialization can transition successfully from having no feature flag set to having the no_objects_and_arrays_as_json set. Trying to set objects_and_arrays_as_json on an existing materialization fails because of the newly-added compatibility check for existing root document columns, and the change to not allow them to be migrated. A brand new task with objects_and_arrays_as_json set works fine with JSON columns from the start.

This change is

When the BigQuery materialization connector was initially built, BigQuery did not have a native JSON column type, so arrays and objects were materialized as "stringified" JSON. There is now a native JSON column type, so we can use that for these kinds of fields. The main complication here is preserving the behavior of pre-existing materializations, while allowing new ones to use the JSON column type. The `objects_and_arrays_as_json` feature flag was added previously, and that flag will toggle using the old stringified columns instead of the JSON columns. Most of the diff here is mechanical changes needed to thread through the value of that `objects_and_arrays_as_json` flag to where it needs to be.

williamhbaker · 2025-02-10T17:27:28Z

materialize-bigquery/transactor.go

@@ -32,77 +35,86 @@ type transactor struct {
 	loggedStorageApiMessage bool
 }

-func newTransactor(


Most of this diff is just from wrapping the newTransactor behavior in a closure so that it has access to the previously computed dialect, templates, and flag value.

There are no currently known cases where this should be allowed, and it is likely to break situations where the document is currently materialized as a string column and attempts to be migrated to a JSON column, where string -> JSON migrations results in a JSON string rather than an actual JSON object.

mdibaiee · 2025-02-12T18:28:34Z

@williamhbaker regarding release process, why do we not set the flag on tasks now, and then release this with objects_and_arrays_as_json as the default?

williamhbaker · 2025-02-12T18:31:38Z

@williamhbaker regarding release process, why do we not set the flag on tasks now, and then release this with objects_and_arrays_as_json as the default?

That's a good point, since I did this a little strangely and already added the flag in a prior PR we can do it like you suggest, and not have to have another follow-up.

I'll update the PR and release process description for that!

…fault New materializations created without the "no_objects_and_arrays_as_json" will now use JSON columns for objects and arrays.

williamhbaker · 2025-02-12T19:23:04Z

@mdibaiee I finished updating this PR to have objects_and_arrays_as_json be the default, let me know what you think. I'll update existing tasks to have no_objects_and_arrays_as_json before merging this.

williamhbaker mentioned this pull request Feb 3, 2025

materialize-bigquery: don't serialize JSON nulls #2323

Merged

williamhbaker force-pushed the wb/mat-bq-json-col branch from f0c7a35 to ba0bfb1 Compare February 10, 2025 16:13

williamhbaker commented Feb 10, 2025

View reviewed changes

williamhbaker force-pushed the wb/mat-bq-json-col branch from 2e75b1e to be372c4 Compare February 10, 2025 17:28

williamhbaker marked this pull request as ready for review February 10, 2025 17:29

williamhbaker requested a review from mdibaiee February 10, 2025 17:29

materialize-bigquery: make JSON columns for arrays and objects the de…

da937ac

…fault New materializations created without the "no_objects_and_arrays_as_json" will now use JSON columns for objects and arrays.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

materialize-bigquery: JSON column type for array and objects #2324

materialize-bigquery: JSON column type for array and objects #2324

williamhbaker commented Feb 3, 2025 •

edited

Loading

williamhbaker Feb 10, 2025

mdibaiee commented Feb 12, 2025

williamhbaker commented Feb 12, 2025

williamhbaker commented Feb 12, 2025

materialize-bigquery: JSON column type for array and objects #2324

Are you sure you want to change the base?

materialize-bigquery: JSON column type for array and objects #2324

Conversation

williamhbaker commented Feb 3, 2025 • edited Loading

williamhbaker Feb 10, 2025

Choose a reason for hiding this comment

mdibaiee commented Feb 12, 2025

williamhbaker commented Feb 12, 2025

williamhbaker commented Feb 12, 2025

williamhbaker commented Feb 3, 2025 •

edited

Loading