[App Feature] [JsonSchema] Enabling the overriding of the catalog on the stream [LOYAL-10211] #2

quocnguyendinh · 2024-04-09T05:05:36Z

https://kaligo.atlassian.net/browse/LOYAL-10211

Background

As described in the RFC's link above, flattening the schema is really needed. However, due to the limitation of the current tap, the stream does not contain the expected schemas for the downstream components to handle, thus the schema ends up not being flattened. This PR is to fix that.

Design

Writing the merging process to override the schema from the catalog on the schema from the process of discovering db.
Writing the Unit Tests to test for our expected behaviors.

Impact

This will help for the feature of flattening schema using flattening_enabled and flattening_max_depth from the downstream components (like mapper) to be activated, thus realizing the schema flattening feature.

Caveats

This PR does not cover one edge case in which the schema of the existing tables changes (like the column name is changed, the data type is changed,...) while the catalog is already created before. All those changes will not be synchronized onto the stream unless we delete the Catalog default file in .run/meltano/tap-postgres.
(For the case adding a new column, this can work normally).

Testing

There are 2 integration tests having been added. One is for testing the expected behavior of the stream merging mechanism and one is for testing the case in which we will add another column to see if this merging can adapt to the schema evolution or not.

Docs

RFC: https://www.notion.so/kaligo/JSON-schema-flatenning-improvement-271d7e4842d74da1a5de9ae59d3ae656

khoaanguyenn

💯 Perfect, great jobs 👏

I've put some comments to discuss with you on the unused methods.

khoaanguyenn · 2024-04-10T05:22:23Z

tests/integration/test_streams_utils.py

+		table_spec = {
+			"columns": [
+				{"name": "newcol", "type": "integer", "is_new_col": True}
+			],
+			"name": self.table_name
+		}
+		alter_schema_test_table(table_spec)


Suggested change

table_spec = {

"columns": [

{"name": "newcol", "type": "integer", "is_new_col": True}

],

"name": self.table_name

}

alter_schema_test_table(table_spec)

table_spec = {

"columns": [

{"name": "newcol", "type": "integer"}

],

"name": self.table_name

}

add_columns(table_spec)

Would it be more simpler if we explicitly add_columns in lieu of the generics alter_schema_test_table method ?

As reviewer, I was trying to reading alter_schema_test_table to figure out that this method is actually adding new columns in this case 🤣

Okay sir, i will change this and fix this in my next commit 😄

khoaanguyenn · 2024-04-10T05:27:40Z

tests/utils.py

+def alter_schema_test_table(table_spec, target_db='postgres'):
+    with get_test_connection(target_db) as conn:
+        with conn.cursor(cursor_factory=psycopg2.extras.DictCursor) as cur:
+            table = table_spec['name']
+            for col_spec in table_spec['columns']:
+                for sql in build_alter_table_sql(quote_ident(table, cur), col_spec):
+                    LOGGER.info("alter table sql: %s", sql)
+                    cur.execute(sql)


Suggested change

def alter_schema_test_table(table_spec, target_db='postgres'):

with get_test_connection(target_db) as conn:

with conn.cursor(cursor_factory=psycopg2.extras.DictCursor) as cur:

table = table_spec['name']

for col_spec in table_spec['columns']:

for sql in build_alter_table_sql(quote_ident(table, cur), col_spec):

LOGGER.info("alter table sql: %s", sql)

cur.execute(sql)

def add_columns(table_spec, target_db='postgres'):

with get_test_connection(target_db) as conn:

with conn.cursor(cursor_factory=psycopg2.extras.DictCursor) as cur:

table = table_spec['name']

for col_name, col_type in table_spec['columns']:

sql = "ALTER TABLE {} ADD {} {}".format(table_name, col_name, col_type)

LOGGER.info("alter table sql: %s", sql)

cur.execute(sql)

Nits

khoaanguyenn · 2024-04-10T05:30:16Z

tests/utils.py

+def build_alter_table_sql(table, col_spec):
+    sqls = []
+    if altered_name:=col_spec.get('change_name'):
+        sqls.append("ALTER TABLE {} RENAME COLUMN {} TO {}".format(table, col_spec['name'], altered_name))
+    if altered_type:=col_spec.get('is_change_type'):
+        sqls.append("ALTER TABLE {} ALTER COLUMN {} TYPE {}".format(table, col_spec['name'], altered_type))
+    if col_spec.get("is_new_col"):
+        sqls.append("ALTER TABLE {} ADD {} {}".format(table, col_spec['name'], col_spec['type']))
+    return sqls


Nits: I understand that we will test more cases, but we don't actually use them all now nor the future. Thus, I believe that we should keep what we only need for now and add more utility in latter PR 👍

The reason why I did this is because of the reason why you've specified. So If we only care about the adding another column may be we can remove this function and merge the logic in add_columns function 😄

khoaanguyenn · 2024-04-17T02:30:28Z

💯 Thanks @quocnguyendinh for making the code cleaner, would you mind taking a look at this PR as well @solteszad ?

solteszad

the code looks really polished to me! i would like to see it in practice, can we check the demo on staging? 😲

…tream

quocnguyendinh added 2 commits April 9, 2024 11:50

adding the merging and overriding mechanism

973aa6a

adding the unit tests for the merging process

da3639d

khoaanguyenn mentioned this pull request Apr 10, 2024

Add 'skip_last_n_seconds' config parameter for incremental replication #1

Merged

13 tasks

khoaanguyenn reviewed Apr 10, 2024

View reviewed changes

khoaanguyenn changed the title ~~Enabling the overriding of the catalog on the stream~~ [App Feature] [JsonSchema] Enabling the overriding of the catalog on the stream [LOYAL-10211] Apr 10, 2024

khoaanguyenn assigned quocnguyendinh Apr 10, 2024

khoaanguyenn added enhancement New feature or request ready for review labels Apr 10, 2024

quocnguyendinh added 2 commits April 11, 2024 09:50

change the test utils function

e9246f0

fix bugs in add cols

64bb702

solteszad approved these changes Apr 21, 2024

View reviewed changes

solteszad added code approved and removed ready for review labels Apr 24, 2024

khoaanguyenn and others added 2 commits October 2, 2024 18:23

Merge branch 'master' into feature/orverride_catalog_over_discovery_s…

cda49c5

…tream

adding the mechanism to merge the metadata

58a4445

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[App Feature] [JsonSchema] Enabling the overriding of the catalog on the stream [LOYAL-10211] #2

[App Feature] [JsonSchema] Enabling the overriding of the catalog on the stream [LOYAL-10211] #2

quocnguyendinh commented Apr 9, 2024 •

edited by khoaanguyenn

Loading

Uh oh!

khoaanguyenn left a comment

Uh oh!

khoaanguyenn Apr 10, 2024 •

edited

Loading

Uh oh!

quocnguyendinh Apr 11, 2024

Uh oh!

khoaanguyenn Apr 10, 2024 •

edited

Loading

Uh oh!

khoaanguyenn Apr 10, 2024

Uh oh!

quocnguyendinh Apr 11, 2024

Uh oh!

khoaanguyenn commented Apr 17, 2024

Uh oh!

solteszad left a comment •

edited

Loading

Uh oh!

Uh oh!

[App Feature] [JsonSchema] Enabling the overriding of the catalog on the stream [LOYAL-10211] #2

Are you sure you want to change the base?

[App Feature] [JsonSchema] Enabling the overriding of the catalog on the stream [LOYAL-10211] #2

Conversation

quocnguyendinh commented Apr 9, 2024 • edited by khoaanguyenn Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Design

Impact

Caveats

Testing

Docs

Uh oh!

khoaanguyenn left a comment

Choose a reason for hiding this comment

Uh oh!

khoaanguyenn Apr 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

quocnguyendinh Apr 11, 2024

Choose a reason for hiding this comment

Uh oh!

khoaanguyenn Apr 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

khoaanguyenn Apr 10, 2024

Choose a reason for hiding this comment

Uh oh!

quocnguyendinh Apr 11, 2024

Choose a reason for hiding this comment

Uh oh!

khoaanguyenn commented Apr 17, 2024

Uh oh!

solteszad left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

quocnguyendinh commented Apr 9, 2024 •

edited by khoaanguyenn

Loading

khoaanguyenn Apr 10, 2024 •

edited

Loading

khoaanguyenn Apr 10, 2024 •

edited

Loading

solteszad left a comment •

edited

Loading