Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support BigQuery JSON column type #5416

Merged
merged 2 commits into from
Jul 4, 2024
Merged

Conversation

turb
Copy link
Contributor

@turb turb commented Jul 3, 2024

Adds support for the JSON column type on BigQuery.

This mimics what has been done for GEOGRAPHY: use a simple case class Json(wkt: String) container. There are a couple of changes I have copied without knowing their use, so they may have none (like in StorageUtil).

An alternative may be to store it into a Json parser implementation model (eg Jackson), however it would tie it to more complex things.

Copy link

codecov bot commented Jul 3, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 61.23%. Comparing base (c4d4554) to head (8ca8be7).

Files Patch % Lines
.../scala/com/spotify/scio/bigquery/StorageUtil.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5416      +/-   ##
==========================================
- Coverage   61.24%   61.23%   -0.01%     
==========================================
  Files         310      310              
  Lines       11058    11060       +2     
  Branches      751      736      -15     
==========================================
+ Hits         6772     6773       +1     
- Misses       4286     4287       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -213,6 +213,14 @@ private[types] object TypeProvider {
tq"_root_.java.lang.String @${typeOf[BigQueryTag]}",
q"{$rhs}.wkt"
)
case q"$m val $n: _root_.com.spotify.scio.bigquery.types.Json = $rhs" => // Could not find how to mutualize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran some test. This branch is not matched (nor is the Geography one).
I'm wondering if toTable works in that case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only tested in real life fromTable (that I needed) TBH. But I may need toTable in the distant future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to adapt the integration test to make sure those types work as expected

Copy link
Contributor

@RustedBones RustedBones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the Geography and Json types in the integration test and table creation + insertion are working.

@RustedBones RustedBones merged commit 4e7f918 into spotify:main Jul 4, 2024
9 of 10 checks passed
@RustedBones
Copy link
Contributor

Thank you!

@turb
Copy link
Contributor Author

turb commented Jul 4, 2024

It seems something does not work as expected with REPEATED JSON: I am getting two lines for the same entry, the JSON repeated data being splited between the two...

@RustedBones
Copy link
Contributor

I'm not sure I get the end state, can you give an example ?

@turb
Copy link
Contributor Author

turb commented Jul 4, 2024

Nevermind, I did some tests and could not reproduce it. I'll open a bug if I find it back, it may not be related.

@turb turb deleted the bigquery-json branch July 4, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants