Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Cluster by' issue #192

Open
rasmushemph opened this issue Jan 27, 2025 · 2 comments
Open

'Cluster by' issue #192

rasmushemph opened this issue Jan 27, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@rasmushemph
Copy link

rasmushemph commented Jan 27, 2025

I am trying to cluster-by a key in Titan. The clustering is live using SF UI, now want to keep it in code.

Plan is to use Titan command line. Here is my tables.yml file.

name: accountXXXXG-ACME-table
run_mode: CREATE-OR-UPDATE
allowlist: 
  - table

tables:
  - name: INT_ORDER_PAYMENTS
    database: ANALYTICS
    schema: ACME
    columns:
      - name: order_id
        data_type: integer
    transient: False
    owner: TRANSFORMER
    cluster_by: 
      - date_trunc(month, order_date_created::date) 

This is the command I have previously run to cluster by month in SF.
ALTER TABLE ANALYTICS.ACME.INT_ORDER_PAYMENTS CLUSTER BY (date_trunc(month, order_date_created::date))

No issues and the SELECT SYSTEM$CLUSTERING_INFORMATION('acme.int_order_payments');
shows expected clustering.

Using Titan, the plan indicates that there is a difference between plan and current (although I feel that there shouldn't be).

python -m titan plan --config=config/tables.yml
[TITAN_ADMIN:ACCOUNTADMIN] > 
        SELECT
            CURRENT_ACCOUNT_NAME() as account,
            CURRENT_ACCOUNT() as account_locator,
            CURRENT_USER() as user,
            CURRENT_ROLE() as role,
            CURRENT_AVAILABLE_ROLES() as available_roles,
            CURRENT_SECONDARY_ROLES() as secondary_roles,
            CURRENT_DATABASE() as database,
            CURRENT_SCHEMAS() as schemas,
            CURRENT_WAREHOUSE() as warehouse,
            CURRENT_VERSION() as version,
            CURRENT_REGION() as region,
            SYSTEM$BOOTSTRAP_DATA_REQUEST('ACCOUNT') as account_data
            (1 rows, 0.16s)
Duplicate resource urn::SA63696:table/ANALYTICS.ACME.INT_ORDER_PAYMENTS with conflicting data, discarding Table(INT_ORDER_PAYMENTS)
[TITAN_ADMIN:ACCOUNTADMIN] > USE SECONDARY ROLES ALL    (1 rows, 0.10s)
[TITAN_ADMIN:ACCOUNTADMIN] > SHOW DATABASES IN ACCOUNT    (6 rows, 0.10s)
[TITAN_ADMIN:ACCOUNTADMIN] > SHOW PARAMETERS IN DATABASE ANALYTICS    (22 rows, 0.09s)
[TITAN_ADMIN:ACCOUNTADMIN] > SHOW SCHEMAS IN ACCOUNT    (63 rows, 0.25s)
[TITAN_ADMIN:ACCOUNTADMIN] > SHOW PARAMETERS IN SCHEMA ANALYTICS.PUBLIC    (22 rows, 0.08s)
[TITAN_ADMIN:ACCOUNTADMIN] > SHOW PARAMETERS IN SCHEMA ANALYTICS.ACME    (22 rows, 0.09s)
[TITAN_ADMIN:ACCOUNTADMIN] > SHOW TABLES IN ACCOUNT    (1631 rows, 1.40s)
[TITAN_ADMIN:ACCOUNTADMIN] > DESC TABLE ANALYTICS.ACME.INT_ORDER_PAYMENTS    (41 rows, 0.08s)
[TITAN_ADMIN:ACCOUNTADMIN] > SHOW PARAMETERS FOR TABLE ANALYTICS.ACME.INT_ORDER_PAYMENTS    (11 rows, 0.10s)
[TITAN_ADMIN:ACCOUNTADMIN] > SHOW ROLES IN ACCOUNT    (12 rows, 0.08s)

» titan core
» Plan: 0 to create, 1 to update, 0 to transfer, 0 to drop.

~ urn::SA63696:table/ACME.INT_ORDER_PAYMENTS {
  ~ cluster_by = ['date_trunc(month', 'order_date_created::date'] -> ['date_trunc(month, order_date_created::date)']
}

When I still try to add clustering, I get this error:

TITAN_ADMIN:TRANSFORMER] > ALTER TABLE ANALYTICS.ACME.INT_ORDER_PAYMENTS SET CLUSTER BY (date_trunc(month, order_date_created::date))    (err 1003, 0.18s)
Traceback (most recent call last):
  File "/Users/rasmus/git/snowflake/.venv/lib/python3.11/site-packages/titan/client.py", line 78, in execute
    cur.execute(sql_text)
  File "/Users/rasmus/git/snowflake/.venv/lib/python3.11/site-packages/snowflake/connector/cursor.py", line 1087, in execute
    Error.errorhandler_wrapper(self.connection, self, error_class, errvalue)
  File "/Users/rasmus/git/snowflake/.venv/lib/python3.11/site-packages/snowflake/connector/errors.py", line 290, in errorhandler_wrapper
    handed_over = Error.hand_to_other_handler(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rasmus/git/snowflake/.venv/lib/python3.11/site-packages/snowflake/connector/errors.py", line 345, in hand_to_other_handler
    cursor.errorhandler(connection, cursor, error_class, error_value)
  File "/Users/rasmus/git/snowflake/.venv/lib/python3.11/site-packages/snowflake/connector/errors.py", line 221, in default_errorhandler
    raise error_class(
snowflake.connector.errors.ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 1 at position 82 unexpected 'BY'.
syntax error line 1 at position 96 unexpected '('.
syntax error line 1 at position 122 unexpected '::'.

In particular, this error message look suspect to me:
ALTER TABLE ANALYTICS.ACME.INT_ORDER_PAYMENTS SET CLUSTER BY
This is not the command I used above to set clustering, and I can not run it in the UI. Is it possible that SF changed syntax?

@teej teej added the bug Something isn't working label Jan 28, 2025
@teej
Copy link
Collaborator

teej commented Jan 28, 2025

Looks like titan is incorrectly dropping the lagging paren when it fetches state for CLUSTER BY. There's also some quoting normalization that needs to happen. I'll queue this up for the next release.

@rasmushemph
Copy link
Author

rasmushemph commented Jan 29, 2025

Sounds good.
About the normalization. Mentioning that I am having additional trouble with this "ORDER" table.. sounds like that may be part of the fix.

  # the order table gives trouble because of SQL name clash. Quoting helps only to a certain degree...
     - name: webapp."ORDER"
       database: RAW
       columns:
         - name: id
           data_type: integer
       transient: False
      owner: LOADER
    cluster_by: 
     - date_trunc(month, date_created)::date

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants