You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't think that both syntaxes should be supported, and i think we should drop support for one of them
Suggestion
Drop support for create external table t from <file_type> options (location = <object_store>)
make create external table t from <object_store> options (file_type = <file_type>) more explicit about only allowing object stores and file types. [1]
for example from delta|lance|iceberg options (...) should not use the object store options, but it's own dedicated options, as they are their own storage formats, and don't integrate into object store nicely like file formats do. It also makes the distinction between the two a lot clearer.
Additionally, create external table t from <file_type> should probably never be valid. If you want to do that, you should probably be using the table function, or the inference from './path/to/file.<file_type>' instead.
For brevity, for the remainder of the text, from ... will refer to create external table t from .....
(create external table t from t -> from t)
in the above examples, I suggest they be rewritten to use from local instead of from <file_type | table_type>.
from bson options (location=location...) -> from local options (location=location, file_type = 'bson')
from azure credentials azure_creds options (location='azure://'); -> from delta options(location='azure://') credentials azure_creds
So just to recap, all File types (bson, json, csv, parquet, avro, ...) should use the object store as the from qualifier
from <local | s3 | azure | gcs> options (object_store_opts)
from local options (file_type = 'csv', ...object_store_opts);
from s3 options (file_type = 'bson', ...object_store_opts);
from azure options (file_type = 'parquet', ...object_store_opts);
all storage formats should use the storage format as the from qualifier (delta, lance, iceberg)
from delta options (location = 's3://', ...other delta specific options)
from excel options (location = 's3://', ...other excel specific options)
[1] I define a file format as a format that is used to represent a single table usually in a single file. (there is no notion of a table, and they are generally quite portable (csv, json, parquet, ...)).
A table format (AKA storage format) is any format that supports multiple tables (delta, lance, excel, iceberg).
The text was updated successfully, but these errors were encountered:
I pretty strongly believe that we should hide FROM <storage location/service> from users (because that's all information that we can (and do, in many cases) derive from the path, and using FROM exclusively for "format" rather than "location" simplifies this. The inverse (location and not format) would mean that we would either need to deduce format, or add additional syntax.
tychoish
added
syntax
Issues or features related to our SQL syntax
ux 💎
Convenience label for grouping UX issues
and removed
bug 🐛
Something isn't working
labels
Apr 9, 2024
Description
there are some inconsistencies when using table options, especially in combination with
create external table
.For many of them, they use the same
object_store
options, which allows for some rather inconsistent syntaxAdditionally, the syntax can often be reversed for object stores (which is confusing)
I don't think that both syntaxes should be supported, and i think we should drop support for one of them
Suggestion
create external table t from <file_type> options (location = <object_store>)
create external table t from <object_store> options (file_type = <file_type>)
more explicit about only allowing object stores and file types. [1]for example
from delta|lance|iceberg options (...)
should not use the object store options, but it's own dedicated options, as they are their own storage formats, and don't integrate into object store nicely like file formats do. It also makes the distinction between the two a lot clearer.Additionally,
create external table t from <file_type>
should probably never be valid. If you want to do that, you should probably be using the table function, or the inferencefrom './path/to/file.<file_type>'
instead.For brevity, for the remainder of the text,
from ...
will refer tocreate external table t from ...
..(
create external table t from t
->from t
)in the above examples, I suggest they be rewritten to use
from local
instead offrom <file_type | table_type>
.from bson options (location=location...)
->from local options (location=location, file_type = 'bson')
from azure credentials azure_creds options (location='azure://');
->from delta options(location='azure://') credentials azure_creds
So just to recap, all File types (bson, json, csv, parquet, avro, ...) should use the object store as the
from
qualifierfrom <local | s3 | azure | gcs> options (object_store_opts)
from local options (file_type = 'csv', ...object_store_opts);
from s3 options (file_type = 'bson', ...object_store_opts);
from azure options (file_type = 'parquet', ...object_store_opts);
all storage formats should use the storage format as the
from
qualifier (delta, lance, iceberg)from <delta | lance | iceberg | excel | ...> options (T_opts)
from delta options (location = 's3://', ...other delta specific options)
from excel options (location = 's3://', ...other excel specific options)
[1] I define a file format as a format that is used to represent a single table usually in a single file. (there is no notion of a table, and they are generally quite portable (csv, json, parquet, ...)).
A table format (AKA storage format) is any format that supports multiple tables (delta, lance, excel, iceberg).
The text was updated successfully, but these errors were encountered: