Skip to content
This repository has been archived by the owner on May 17, 2024. It is now read-only.

Add option to define level of precision for a numeric column #649

Closed
daronjp opened this issue Jul 19, 2023 · 2 comments
Closed

Add option to define level of precision for a numeric column #649

daronjp opened this issue Jul 19, 2023 · 2 comments
Labels
enhancement New feature or request stale_immune Immunity to stale bot

Comments

@daronjp
Copy link

daronjp commented Jul 19, 2023

When migrating a table or making a minor update to a dbt model, we like to confirm the the dev and prod tables are identical. Oftentimes, especially when a table contains floats, two numbers that appear to be exact matches to the eye are evaluated by the DB (Snowflake) as not matching. This can be frustrating, as it requires additional work to prove they do indeed match.

Describe the solution you'd like
I would like the ability to specify a level of precision for a given column. Using the example above, I'd like to specify that data-diff should consider two numbers that are within 0.001 (or any other value the user would like) of each other to be a match.

Describe alternatives you've considered
My current workaround outside of data-diff looks like this:

-- all records in table_a not in table_b
WITH diffs AS (
    SELECT
        pk, numeric_col
    FROM table_a

    EXCEPT

    SELECT
        pk, numeric_col
    FROM table_b
)

-- ideally returns 0 rows, showing that all records in table_a are in table_b
SELECT
    b.pk, b.numeric_col, d.numeric_col
FROM table_b AS b
JOIN diffs AS d ON d.pk = b.pk
WHERE
    ABS(b.numeric_col - d.numeric_col) > 0.001

Additional context
I would like to be able to specify this either in the dbt models.yml meta values, or when running a joindiff/hashdiff.

@daronjp daronjp added the enhancement New feature or request label Jul 19, 2023
@dlawin dlawin added stale_immune Immunity to stale bot and removed triage labels Aug 9, 2023
@ivan-toriya-precis
Copy link

Agree, this option is very needed. Also it can solve #379

I can see it's already available in data-diff Cloud: https://docs.datafold.com/data_diff/in-database_diffing#tolerance-for-floats

@glebmezh
Copy link
Contributor

Hi @daronjp and @ivan-toriya-precis!

I'm sorry for the delay in following up on this. Thank you for taking the time to raise this issue!

We made a hard decision to sunset the data-diff package and won't provide further development or support.

As @ivan-toriya-precis pointed out, custom numeric precision is supported in Datafold Cloud.

-Gleb

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request stale_immune Immunity to stale bot
Projects
None yet
Development

No branches or pull requests

4 participants