[Feature] Update the table_type field to table_format #729

amychen1776 · 2024-09-26T10:06:44Z

Is this your first time submitting a feature request?

I have searched the existing issues, and I could not find an existing issue for this feature

Describe the feature

Right now the iceberg table configuration calls the table format field as table_type. Let's standardize to table_format

Describe alternatives you've considered

N/A

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

nicor88 · 2024-09-26T10:11:12Z

@amychen1776 some historical context on why we used table_type.

There is another paramter called: format

format (default='parquet')
The data format for the table
Supports ORC, PARQUET, AVRO, JSON, TEXTFILE

We dindn't want the final user to be confused about format vs table_format, I do believe that using table_format and having still another parameter called format is really confusing.

Also to be clear, table_type specify the type of table in the context of Athena, that can be Hive (external) or Iceberg(open table format) - the name come from AWS itself actually, and it's an extention of it, see here

TableType – UTF-8 string, not more than 255 bytes long.
The type of this table. AWS Glue will create tables with the EXTERNAL_TABLE type. Other services, such as Athena, may create tables with additional table types.

The config format, on the other hand, specify how the data is going to be written in S3: parquet, avro, json and so on.

IMHO using the naming table_format is not exactly correct, and leads to ambiguity - I just want to give my user prospective on this.

amychen1776 · 2024-10-01T19:30:02Z

Thank you @nicor88 - I appreciate the input! I think the standardization should look like this:

table_format = iceberg, hive, delta
file_format = parquet, avro, json etc

This would align with the broader industry standards on what to call all of these. My goal with this issue is create a shared language across all the adapters so that we have a standard way of referring to this. This also improves adapter maintenance for someone who oversees multiple. We can also look into aliasing which we have done for past things like BQ with projects rather than database so that it's the same on the backend but the user has something different. In this situation - I'm not quite seeing the same exactly validation that we should do this

nicor88 · 2024-10-02T06:14:10Z

Using file_format definitely makes sense, and actually addresses my concern.

After this refactor is in, we should aim to bump the minor version to "signal" the breaking change.

amychen1776 · 2024-10-02T11:39:42Z

Good - that's exactly why we want to make this more properly supported so it's more clear.
Interesting thought there - I'm actually planning to put it behind a behavior flag because this will help users migrate (tell them to migrate in the warnings) and reserve minor versions for new functionality.

amychen1776 added the enhancement New feature or request label Sep 26, 2024

colin-rogers-dbt assigned VersusFacit Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Update the table_type field to table_format #729

[Feature] Update the table_type field to table_format #729

amychen1776 commented Sep 26, 2024

nicor88 commented Sep 26, 2024 •

edited

Loading

amychen1776 commented Oct 1, 2024 •

edited

Loading

nicor88 commented Oct 2, 2024

amychen1776 commented Oct 2, 2024

[Feature] Update the table_type field to table_format #729

[Feature] Update the table_type field to table_format #729

Comments

amychen1776 commented Sep 26, 2024

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

nicor88 commented Sep 26, 2024 • edited Loading

amychen1776 commented Oct 1, 2024 • edited Loading

nicor88 commented Oct 2, 2024

amychen1776 commented Oct 2, 2024

nicor88 commented Sep 26, 2024 •

edited

Loading

amychen1776 commented Oct 1, 2024 •

edited

Loading