Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add source versioning design doc #645

Merged
merged 3 commits into from
Sep 24, 2024
Merged

Conversation

bobbyiliev
Copy link
Contributor

As per the Source Versioning / Tables from Sources changes: MaterializeInc/materialize#27907

@bobbyiliev bobbyiliev requested a review from a team as a code owner September 2, 2024 12:21
@bobbyiliev bobbyiliev requested review from arusahni, rjobanp, morsapaes and benesch and removed request for a team September 2, 2024 12:21
Copy link
Member

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all makes sense to me! Only took a quick skim though. Deferring the full review to the experts (@rjobanp and @morsapaes).

docs/developer/source-versioning-2024-08-30.md Outdated Show resolved Hide resolved
Comment on lines 126 to 134
1. Deprecate the `table` field in existing source resources.
2. Introduce the new `materialize_table_from_source` resource.
3. Allow both old and new configurations to coexist during a transition period.

This approach allows users to migrate their configurations gradually:

- Existing sources can still be created and managed.
- New tables (formerly subsources) will be created as separate `materialize_table_from_source` resources.
- Users can migrate their configurations at their own pace by replacing `table` blocks with `materialize_table_from_source` resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really smart! Will you be able to mix-n-match table and materialize_source_table resources for the same source? Or will you always have to migrate an entire source at once?

My hunch is that we should require that you only use one or the other on a per source basis. That seems likely to make the code much simpler and easier to reason about. And it still affords the user quite a bit of flexibility to do the migration incrementally by migrating one source at a time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you be able to mix-n-match table and materialize_source_table resources for the same source? Or will you always have to migrate an entire source at once?

Based on what I've tested so far using main, it looks like that you could have both running simultaneously. So if we were to take the following example:

  • Existing standard Postgres source:
resource "materialize_source_postgres" "pg_source" {
  name         = "pg_source"
  cluster_name = "quickstart"

  expose_progress {
    name = "expose_postgres_progress"
  }

  postgres_connection {
    name          = materialize_connection_postgres.postgres_connection.name
  }
  publication = "mz_source"
  table {
    upstream_name  = "table1"
    upstream_schema_name = "public"
  }
  table {
    upstream_name  = "table2"
    upstream_schema_name = "public"
  }
  ...
}
  • Then define a new materialize_source_table:
resource "materialize_source_table" "source_table3" {
  name           = "source_table3"
  schema_name    = "public"
  database_name  = "materialize"

  source {
    name          = materialize_source_postgres.pg_source.name
    schema_name   = "public"
    database_name = "materialize"
  }

  upstream_name         = "table2"
  upstream_schema_name  = "public"

  text_columns = [
    "updated_at"
  ]
}

So the user should still be able to slowly do the migration if needed as the table blocks and the materialize_source_table are independent, at least until we do the actual migration on the Materialize side, then in that case, users would just need to define all of their tables as materialize_source_table and as long as we have all of the required information in the catalog tables, we should be able to update the state accordingly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like that you could have both running simultaneously

Yes at the moment we allow both if you have the new enable_create_table_from_source feature-flag enabled, however we could artificially enforce that you are using either the old model or the new model in the terraform provider on a per-source basis to make things simpler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, if it's going to be supported to have both simultaneously on the database side, then maybe it's actually simpler to exactly mirror that in Terraform?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it would be simpler if we could have it as it is at the moment. Allowing us to have both simultaneously on the database side makes things easier for the Terraform implementation as well.

Comment on lines 126 to 134
1. Deprecate the `table` field in existing source resources.
2. Introduce the new `materialize_source_table` resource.
3. Allow both old and new configurations to coexist during a transition period.

This approach allows users to migrate their configurations gradually:

- Existing sources can still be created and managed.
- New tables (formerly subsources) will be created as separate `materialize_source_table` resources.
- Users can migrate their configurations at their own pace by replacing `table` blocks with `materialize_source_table` resources.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me to for allowing users to opt-in to the migration on their own accord!

We've also discussed implementing a catalog migration on the MZ side to automatically migrate all existing sources/subsources over to the new model, and in that case we would alter the CREATE SOURCE statement to remove the deprecated table field options. If the user reruns terraform after that migration it would attempt to re-introduce the table field unless the user was proactive in updating their terraform config in coordination with that migration. Do we think there'd be a way to detect this situation? Perhaps if the table field is already empty and the enable_create_table_from_source feature-flag is enabled we just ignore the terraform config for that option and emit a warning?

Copy link
Member

@benesch benesch Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in general these migrations are so hairy when they involve stuff happening inside Materialize and stuff happening with Terraform.

I don't know that there will be much value in teaching the Terraform configuration to ignore the table field once the migration occurs. There's no guarantee that users will upgrade to a version of the Terraform provider that has the special handling for table before we do the catalog migration. And the catalog migration will happen behind Terraform's back, which is pretty sketchy—someone who thought they had everything properly tracked in Terraform will suddenly find that is not the case.

I think our best bet here may be to support the two syntaxes indefinitely, but disable the old syntax for new sources as soon as we can pull it off. That will prevent the problem from getting any worse. Then, after a couple months, hopefully there are few enough sources remaining that use the old syntax that we can go customer by customer and source by source and figure out if they're using Terraform to manage the source or not. If they're using Terraform, I think we're better off having field eng work with them to get through the upgrade manually than trying to do anything automatic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our best bet here may be to support the two syntaxes indefinitely, but disable the old syntax for new sources as soon as we can pull off.

This sounds very good to me!

I think we're better off having field eng work with them to get through the upgrade manually than trying to do anything automatic.

I will try to prepare a step by step migrations guide on how to switch from the old to the new syntax on the Terraform side.

@bobbyiliev
Copy link
Contributor Author

I have a working PoC of this here: #647

Description: "Tables to be ingested from the source. This field is deprecated and will be removed in a future version.",
Type: schema.TypeSet,
Optional: true,
Deprecated: "Use the new `materialize_source_table` resource instead.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should link to the migration guide in these deprecated notices, once it exists!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants