-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source versioning: Postgres, MySQL and Load generator #647
base: main
Are you sure you want to change the base?
Conversation
c4a61c8
to
e202851
Compare
docs/resources/source_table.md
Outdated
@@ -29,7 +29,7 @@ description: |- | |||
- `ownership_role` (String) The owernship role of the object. | |||
- `region` (String) The region to use for the resource connection. If not set, the default region is used. | |||
- `schema_name` (String) The identifier for the table schema in Materialize. Defaults to `public`. | |||
- `text_columns` (List of String) Columns to be decoded as text. | |||
- `text_columns` (List of String) Columns to be decoded as text. Not supported for the load generator sources, if the source is a load generator, the attribute will be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to sources, we might want a source table resource per source type? The existing source-level options will basically shift to source table-level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed, I was just thinking about this. With the MySQL and Postgres sources, it is probably fine, but as soon as we add Kafka and Webhook sources, the logic will get out of hand.
Will refactor this to have a separate source table resource per source!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work!
docs/resources/source_kafka.md
Outdated
- `start_offset` (List of Number, Deprecated) Read partitions from the specified offset. Deprecated: Use the new materialize_source_table_kafka resource instead. | ||
- `start_timestamp` (Number, Deprecated) Use the specified value to set `START OFFSET` based on the Kafka timestamp. Deprecated: Use the new materialize_source_table_kafka resource instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two options are still currently only possible on the top-level CREATE SOURCE
statement for kafka sources -- not yet on a per-table basis. It will require a non-trivial amount more refactoring to allow them on a per-table basis so I'm unsure if we will do that work until it's requested by a customer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes! Good catch! Thank you!
@@ -53,12 +53,12 @@ resource "materialize_source_mysql" "test" { | |||
- `comment` (String) **Public Preview** Comment on an object in the database. | |||
- `database_name` (String) The identifier for the source database in Materialize. Defaults to `MZ_DATABASE` environment variable if set or `materialize` if environment variable is not set. | |||
- `expose_progress` (Block List, Max: 1) The name of the progress collection for the source. If this is not specified, the collection will be named `<src_name>_progress`. (see [below for nested schema](#nestedblock--expose_progress)) | |||
- `ignore_columns` (List of String, Deprecated) Ignore specific columns when reading data from MySQL. Can only be updated in place when also updating a corresponding `table` attribute. Deprecated: Use the new materialize_source_table resource instead. | |||
- `ignore_columns` (List of String, Deprecated) Ignore specific columns when reading data from MySQL. Can only be updated in place when also updating a corresponding `table` attribute. Deprecated: Use the new materialize_source_table_mysql resource instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi this option is also being renamed MaterializeInc/materialize#29438 but the old name will be aliased to the new one, so this shouldn't break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I will go ahead and use the exclude columns for the new table source resource!
docs/resources/source_table_kafka.md
Outdated
- `start_offset` (List of Number) Read partitions from the specified offset. | ||
- `start_timestamp` (Number) Use the specified value to set `START OFFSET` based on the Kafka timestamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these aren't currently available on a per-table basis for kafka sources
docs/resources/source_table_kafka.md
Outdated
- `schema_name` (String) The identifier for the source schema in Materialize. Defaults to `public`. | ||
- `start_offset` (List of Number) Read partitions from the specified offset. | ||
- `start_timestamp` (Number) Use the specified value to set `START OFFSET` based on the Kafka timestamp. | ||
- `upstream_schema_name` (String) The schema of the table in the upstream database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this refer to for kafka sources? we might just want to omit it since the upstream reference should just be the kafka topic name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, this was an overlook on my end in the schema for the Kafka source table resource.
startOffset []int | ||
startTimestamp int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two aren't used below and also aren't possible on the statement
|
||
This guide will walk you through the process of migrating your existing source table definitions to the new `materialize_source_table_{source}` resource. | ||
|
||
For each source type (e.g., MySQL, Postgres, etc.), you will need to create a new `materialize_source_table_{source}` resource for each table that was previously defined within the source resource. This ensures that the tables are preserved during the migration process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each source type (e.g., MySQL, Postgres, etc.), you will need to create a new `materialize_source_table_{source}` resource for each table that was previously defined within the source resource. This ensures that the tables are preserved during the migration process. | |
For each source type (e.g., MySQL, Postgres, etc.), you will need to create a new `materialize_source_table_{source}` resource for each table that was previously defined within the source resource. This ensures that the tables are preserved during the migration process. For Kafka sources, you will need to create at least one `materialize_source_table_kafka` table to hold data for the kafka topic. |
@morsapaes might have better wording for this but I think we should be clear that this migration needs to happen for sources that previously didn't have subsources too (e.g. kafka)
|
||
The same approach can be used for other source types such as Postgres, eg. `materialize_source_table_postgres`. | ||
|
||
## Automated Migration Process (TBD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice - this is great! We will probably want to figure out how to tell them that they will be able to coordinate the 'automated' migration process with their field-engineer representative if they go down this path
0b63a29
to
7870d48
Compare
@morsapaes @bobbyiliev let's discuss this PR at the sources & sinks meeting this week - we should decide when it makes sense to merge this - my thinking is we should do so whenever we move into private preview for the source versioning feature. But if we want to merge sooner and just have a disclaimer that the things mentioned as 'deprecated' here are not actually yet deprecated, that could work too |
6d60efb
to
61cd55c
Compare
61cd55c
to
9b96939
Compare
One thing that we can consider here as per this old tracking issue: #391 is take the chance and decide if we still want to rename some of the attributes in the new source table resources: Lists For attributes that use list, we have more cases of singular than plural.
Blocks We had decided should be singular. There are some blocks that use plural so this could be a good chance to rename those attributes in the new source table load gen resource:
|
Initial implementation for the source versioning refactor as per #646
The main changes to consider:
table
attribute as optional and deprecated for both the MySQL and Postgres sourcesall_tables
bool attribute for the MySQL and the Loadgen sources, as in the past this was defaulting always usingFOR ALL TABLES
in the load gen sources (auction, marketing, tpch) and in the MySQL case whenever notable
blocks were defined, we defaulted toFOR ALL TABLES
. Thisall_tables
bool attribute allows us to create sources without any tables defined as per the source versioning workmaterialize_source_table_{mysql|postgres|load_generator}
resource which allows us to doCREATE TABLE ... FROM SOURCE ...
Things that are still pending: #646