-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dry run fails on models whose fields explicitly reference dependency models #11
Comments
Following on from the discussions in PR #13 There are several cases we need to be able to cover:
SELECT
first_name
FROM `at-data-platform-preprod`.`training`.`employee`;
-- Converts to
SELECT
first_name
FROM (SELECT 'some-string' as first_name)
SELECT
emp.first_name
FROM `at-data-platform-preprod`.`training`.`employee` as emp;
-- Converts to
SELECT
emp.first_name
FROM (SELECT 'some-string' as first_name) as emp;
SELECT
emp.first_name
FROM `at-data-platform-preprod`.`training`.`employee` emp;
-- Converts to
SELECT
emp.first_name
FROM (SELECT 'some-string' as first_name) emp;
SELECT
employee.first_name
FROM `at-data-platform-preprod`.`training`.`employee`;
-- Converts to
SELECT
employee.first_name
FROM (SELECT 'some-string' as first_name) employee; -- Problem: Currently we don't include this 'employee' alias so the dry run fails The current implementation works for cases 1 - 3 but fails for case 4. We need to detect for each model if after the table reference there is an alias. If there is it should be preserved if there is not it should put the name of the table as the alias. So for case 1 we now get: SELECT
first_name
FROM `at-data-platform-preprod`.`training`.`employee`;
-- Converts to
SELECT
first_name
FROM (SELECT 'some-string' as first_name) employee -- This is added but doesn't change anything I think modifying the regex that the select literal matches could help: https://regexr.com/6qllk
There are 3 examples here. If capturing group 3 is empty then we need to add The first part
The second part
But it would take a lot of convincing/testing to show that this works in all cases within valid SQL syntax. Maybe moving away from a pure regex based implementation and trying to parse the SQL a bit more will help solve this issue |
Hey @connor-charles, I think you are right regarding the cases and best solution - we need to detect when the upstream model is not aliased and alias it. Regarding the regex, thanks for the detailed and clear explanation but as you mentioned in my case it would need more convincing / testing than in yours (as you have already made it). If you can pick this up would be great, but a comment regarding your previous suggestion |
I have pushed a commit to improve the test coverage for the literal insertion as there are lot of cases to cover in the SQL parsing logic. There is a library called sqlparse which I have tried to use before with the dry runner but I found it a bit overkill for the small amount of parsing the dry runner needs to do. I think a decent algorithm for this is to iterate backwards from the table reference you are trying to replace to determine if it is a correct candidate for replacing it with a select literal. If you find a
If we split the query by whitespace characters the array should look like this:
Then if we search for To handle comments its a bit more complicated this test we previously had to xfail because the regex couldn't match it properly:
The split SQL query is:
We find the table reference in the last element. Working backwards I think it's possible to detect that the elements between then and the |
Take the model
Since the reference to
table1
is being replaced by(SELECT "some string" as field2, 42 as field2 and True as field3)
, if we don't alias that select literal with the name of the reference (ie.AS table1
), the query will fail to execute. This could be solved by simple adding that bit to the select literalThe text was updated successfully, but these errors were encountered: