Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Complex types are truncated during describe extended #1107

Open
2 tasks done
mikealfare opened this issue Sep 17, 2024 · 8 comments
Open
2 tasks done

[Bug] Complex types are truncated during describe extended #1107

mikealfare opened this issue Sep 17, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@mikealfare
Copy link
Contributor

Is this a new bug in dbt-spark?

  • I believe this is a new bug in dbt-spark
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Complex types are truncated when running this macro:

{% macro spark__get_columns_in_relation_raw(relation) -%}
{% call statement('get_columns_in_relation_raw', fetch_result=True) %}
describe extended {{ relation }}
{% endcall %}
{% do return(load_result('get_columns_in_relation_raw').table) %}
{% endmacro %}

This happens due to DESCRIBE EXTENDED, which truncates the results before returning them.

Expected Behavior

The types should be complete.

Steps To Reproduce

  • create a model (e.g. my_model) with a sufficiently complex type
  • run DESCRIBE EXTENDED my_model
  • look at the resulting type for the complex column

Relevant log output

No response

Environment

- OS:
- Python:
- dbt-core:
- dbt-spark:

Additional Context

No response

@mikealfare mikealfare added bug Something isn't working triage labels Sep 17, 2024
@amychen1776
Copy link
Contributor

@benc-db This is the issue we were talking about yesterday about the issues with the Databricks Metadata API. Is this just a Databricks specific issue?

@benc-db
Copy link

benc-db commented Sep 19, 2024

It is Databricks specific, but may affect dbt-spark as well.

@benc-db
Copy link

benc-db commented Sep 19, 2024

lol, I didn't see where I was commenting. So, I do not know the extent to which describe extended is standard Spark vs Databricks, which is probably what you're asking here.

@amychen1776
Copy link
Contributor

@benc-db yup :)

@mikealfare did you find this bug running on Databricks then?

@mikealfare
Copy link
Contributor Author

@amychen1776 Apologies for the late reply; my GH notifications have been out of control. I believe this was reported by a Cloud customer that was running dbt-spark with Databricks.

@benc-db
Copy link

benc-db commented Oct 11, 2024

I'll summarize here what I'm doing in dbt-databricks: in 1.9 I'm introducing a behavior flag to use information schema to get column types for UC tables. The reason I'm guarding with a flag is because I learned in testing that information schema is not always synced up with reality, and to ensure that it is, I have run a repair table operation before gathering columns. This adds overhead. I'm hopeful that I can remove the flag when sync gets better for information schema, because in my testing, I hit columns missing between successive dbt runs that took on the order of minutes...too long for me to feel comfortable about trusting it for this.

@tinolyuu
Copy link

tinolyuu commented Oct 24, 2024

Hi, not sure if I encountered the same issue. I got runtime error when adding a struct column to an incremental model on dbt-spark. Here's the error.

Runtime Error

 [PARSE_SYNTAX_ERROR] Syntax error at or near ','.(line 7, pos 34)

 == SQL ==
 /* {"app": "dbt", "dbt_version": "1.8.6", "profile_name": "main_spark", "target_name": "dev", "node_id": "model.main.evens_only"} */

     alter table test_db.evens_only_spark

         add columns

                struct_test struct<,... 1 more fields>
 ----------------------------------^^^

It seems the data type read in parse_describe_extended func is [<agate.Row: ('id', 'int', None)>, <agate.Row: ('struct_test', 'struct<,... 1 more fields>', None)>]. Don't know why the struct type doesn't show the internal fields.

@hongtron
Copy link

hongtron commented Nov 19, 2024

This impacts unit testing as well. I can't provide test values for my complex type because the ,... $N more fields> artifact gets compiled into the generated cast statement.

I'm not using Databricks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants