You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
I'm trying to run a freshness check on some of my tables using a timestamp column. However, I'm getting an error saying the check found a str column instead of a timestamp one. I believe this is a bug, because athena shows my column as a timestamp according to the following image.
The error I'm getting is also in the following image:
My dbt_athena package in in version 1.7.2. The query generated by the freshness code is below as the results it produced
select
max(loaded_at) as max_loaded_at,
cast(now() as timestamp) as snapshotted_at
from "awsdatacatalog"."bronze_dev"."callbacks"
Expected Behavior
The dbt source freshness command should be able to run successfully on my source tables. Otherwise, it should point some error related with the SLA configured, not throw an error related with casting types.
Steps To Reproduce
Have a table on athena with a timestamp column (In my case the column was generated by a pyspark script executed on aws glue)
Configure the dbt freshness on the same table:
version: 2
From my side I can add than max(<ts_column>) as max_loaded_at doesn't work fine for dbt source freshness query. For large tables it comes to scan full data without partition predicate. It seems like for max, min functions in Iceberg tables don't using Iceberg metadata. From another side I can say that querying metadata scan almost nothing (few kb).
I propose override collect_freshness macro and depends on table_type and partition column use different queries for dbt source freshness command:
For non-partitioned Iceberg tables and all Hive tables use default query:
selectmax(<ts_column>) as max_loaded_at,
{{ current_timestamp() }} as snapshotted_at
from<source_table>
For partitioned Iceberg tables use the following query:
selectmax(data.<partitioned_by_ts_column>.max) as max_loaded_at
from"<source_schema>"."<source_table>$partitions"
Is this a new bug in dbt-athena?
Current Behavior
I'm trying to run a freshness check on some of my tables using a timestamp column. However, I'm getting an error saying the check found a str column instead of a timestamp one. I believe this is a bug, because athena shows my column as a timestamp according to the following image.
The error I'm getting is also in the following image:
My dbt_athena package in in version 1.7.2. The query generated by the freshness code is below as the results it produced
Expected Behavior
The dbt source freshness command should be able to run successfully on my source tables. Otherwise, it should point some error related with the SLA configured, not throw an error related with casting types.
Steps To Reproduce
version: 2
run the command
dbt source freshness
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: