You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I came across an edge cases I wanted to highlight. The below example inconsistently fails or passes because the column lineages are ordered differently.
To Reproduce
importpytestfromsqllineage.core.metadata_providerimportMetaDataProviderfromsqllineage.utils.entitiesimportColumnQualifierTuplefrom ...helpersimportassert_column_lineage_equal, generate_metadata_providersproviders=generate_metadata_providers(
{
"database_a.table_a": ["col_a", "col_b", "col_c"],
}
)
@pytest.mark.parametrize("provider", providers)deftest_ouput_consistency(provider: MetaDataProvider):
sql="""CREATE TABLE database_b.table_c AS ( SELECT *, 1 AS event_time FROM ( SELECT table_b.col_b AS col_a FROM database_b.table_b AS table_b JOIN database_a.table_a AS table_d ) AS base ) """assert_column_lineage_equal(
sql,
[
(
ColumnQualifierTuple("col_b", "database_a.table_a"),
ColumnQualifierTuple("col_b", "database_b.table_c"),
),
(
ColumnQualifierTuple("col_c", "database_a.table_a"),
ColumnQualifierTuple("col_c", "database_b.table_c"),
),
(
ColumnQualifierTuple("col_b", "database_b.table_b"),
ColumnQualifierTuple("col_a", "database_b.table_c"),
),
],
dialect="athena",
test_sqlparse=False,
test_sqlfluff=True,
metadata_provider=provider,
)
Sometimes the pytest fails with below:
E Expected Lineage: {(Column: database_b.table_b.col_b, Column: database_b.table_c.col_a), (Column: database_a.table_a.col_b, Column: database_b.table_c.col_b), (Column: database_a.table_a.col_c, Column: database_b.table_c.col_c)}
E Actual Lineage: {(Column: database_a.table_a.col_a, Column: database_b.table_c.col_a), (Column: database_a.table_a.col_b, Column: database_b.table_c.col_b), (Column: database_a.table_a.col_c, Column: database_b.table_c.col_c)}
Sometimes with this:
E Expected Lineage: {(Column: database_a.table_a.col_c, Column: database_b.table_c.col_c), (Column: database_b.table_b.col_b, Column: database_b.table_c.col_a), (Column: database_a.table_a.col_b, Column: database_b.table_c.col_b)}
E Actual Lineage: {(Column: database_a.table_a.col_a, Column: database_b.table_c.col_a), (Column: database_a.table_a.col_c, Column: database_b.table_c.col_c), (Column: database_a.table_a.col_b, Column: database_b.table_c.col_b)}
And sometimes it actually succeeds.
Expected behavior
I would expect the column lineages to be consistent in the results.
If I understand the codebase right, it's because the results are only ordered based on first and last lineage element, not the whole lineage:
Hi, first of all thanks for this amazing library!
I came across an edge cases I wanted to highlight. The below example inconsistently fails or passes because the column lineages are ordered differently.
To Reproduce
Sometimes the pytest fails with below:
Sometimes with this:
And sometimes it actually succeeds.
Expected behavior
I would expect the column lineages to be consistent in the results.
If I understand the codebase right, it's because the results are only ordered based on first and last lineage element, not the whole lineage:
sqllineage/sqllineage/runner.py
Lines 154 to 167 in 6189d31
Something like the below would take the whole lineage into account for ordering:
The text was updated successfully, but these errors were encountered: