Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trino-iceberg-minio: unable to USE catalog #12

Closed
GiorgioBaldelli opened this issue Nov 23, 2022 · 5 comments
Closed

trino-iceberg-minio: unable to USE catalog #12

GiorgioBaldelli opened this issue Nov 23, 2022 · 5 comments

Comments

@GiorgioBaldelli
Copy link

GiorgioBaldelli commented Nov 23, 2022

Hi Brian @bitsondatadev, thanks for sharing the trino-iceberg-minio example in the community call and in this repo. Really appreciate your work!

I haven't found a solution for this issue yet:

  1. When attempting to run USE hive; or USE iceberg; to specify the catalog to use when executing a given SQL statement, this error follows:
io.trino.spi.TrinoException: line 1:1: Catalog must be specified when session catalog is not set
        at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:48)
        at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:43)
        at io.trino.execution.UseTask.lambda$execute$1(UseTask.java:65)
        at java.base/java.util.Optional.orElseThrow(Optional.java:403)
        at io.trino.execution.UseTask.lambda$execute$2(UseTask.java:64)
        at java.base/java.util.Optional.orElseGet(Optional.java:364)
        at io.trino.execution.UseTask.execute(UseTask.java:64)
        at io.trino.execution.UseTask.execute(UseTask.java:36)
        at io.trino.execution.DataDefinitionExecution.start(DataDefinitionExecution.java:145)
        at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:249)
        at io.trino.dispatcher.LocalDispatchQuery.lambda$startExecution$7(LocalDispatchQuery.java:143)
        at io.trino.$gen.Trino_403____20221123_100619_2.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)

I've attempted to solve this by using the --catalog flag when initializing the connection:

./trino-cli-388-executable.jar --server http://localhost:8080 --catalog hive

Unfortunately, this is not helpful. When I attempt to create a dummy_table table in hive.iris the table gets also created in iceberg.iris. The result: hive.iris.dummy_table & iceberg.iris.dummy_table.

Have you bumped into a similar issue before and know a workaround perhaps?

Thanks in advance,
Giorgio

@bitsondatadev
Copy link
Owner

bitsondatadev commented Nov 23, 2022

Hey @GiorgioBaldelli!

I'm so glad to hear that this is somewhat helping so far. Hope we can get you across the finish line to that first Trino query bliss.

Okay, so one thing I see that trips up a lot of users is that there's a bit of confusion around the catalog/schema. To be clear Trino follows a three-tier containment hierarchy, <catalog>.<schema>.<table>. This is required as Trino can do federated queries across multiple data sources. The catalog tier is the equivalent of choosing a specific data source, while the schema tier would be choosing say, a database in MySQL or also called schema in Postgres. Based on your question, you likely are already aware but wanted to cover this for future users reading this ;).

The USE clause is to be applied to the schema level. In your case, you're trying to run USE <catalog> when you should be using USE <catalog>.<schema>.

So for your case, try USE hive.iris or USE iceberg.iris.

Let me know if that solves your issue and please close this if it does!

Also thanks for bringing up this question. I'm putting together some video tutorials now and this is a common issue we get so I will create a video specifically on this!

@GiorgioBaldelli
Copy link
Author

GiorgioBaldelli commented Nov 23, 2022

Thanks for the clear response @bitsondatadev!

USE iceberg.iris does the trick.

However, I'm noticing a behaviour that I'm not sure is intended:

CREATE TABLE IF NOT EXISTS iceberg.iris.empty_iris_parquet
    (
         sepal_length DOUBLE,
         sepal_width  DOUBLE,
         petal_length DOUBLE,
         petal_width  DOUBLE,
         class        VARCHAR
    )
    WITH (format = 'PARQUET');

When running the previous command, I'm seeing a that the table gets created successfully at the location iceberg.iris.empty_iris_parquet as expected. To verify that it gets created, I use the command SHOW TABLES IN iceberg.iris.

However, I'm also see a table with the same name at the location hive.iris.empty_iris_parquet. I used the command SHOW TABLES IN hive.iris to verify this.

The redundant table that gets created at hive.iris.empty_iris_parquet is not usable. I'm getting an UNSUPPORTED_TABLE_TYPE (133001) error in Trino whenever I try perform any kind of SQL operation with it.

It looks like catalog redirects may be required?

@bitsondatadev
Copy link
Owner

This is odd and something I might expect should work but there are a lot of compatibility issues between Hive and Iceberg connectors. This can be confusing as they both share the same metastore for catalog data (hence when you create a table or a view from one you'll likely see it in the other) but the way the data is stored on disk is entirely different.

The general case in production is that people are currently living with hive and need to move to iceberg. So we've tested reading hive created tables extensively from iceberg. The inverse (iceberg to Hive) is much less tested and more likely to break. This is rarely an issue as nobody really wants to move back to Hive only from Hive to Iceberg.

@findinpath
Copy link
Collaborator

findinpath commented Nov 24, 2022

It looks like catalog redirects may be required?

Adding in hive.properties :

hive.iceberg-catalog-name=iceberg

and in iceberg.properties:

iceberg.hive-catalog-name=hive

should indeed address the issue you are pointing out.

@GiorgioBaldelli if you want you can contribute a PR to address this aspect.

@bitsondatadev
Copy link
Owner

Looks like the issue was addressed, regarding catalog redirects, please reopen and add a PR if this is a feature anyone is interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants