Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compatlib_function - ConnectBy supports VARCHAR inputs #70

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

h-serizawa
Copy link
Contributor

Currently, ConnectBy supports INTEGER inputs as parent ID and child ID. Now, it supports VARCHAR inputs as well. Close #69.

@h-serizawa
Copy link
Contributor Author

[Test Data]

=> CREATE TABLE company (id INT, supervisor_id INT, name VARCHAR(20));
=> INSERT INTO company VALUES (1, null, 'Patrick');
=> INSERT INTO company VALUES (2, 1, 'Jim');
=> INSERT INTO company VALUES (3, 1, 'Sandy');
=> INSERT INTO company VALUES (4, 3, 'Brian');
=> INSERT INTO company VALUES (4, 3, 'Otto');
=> COMMIT;

[Test Result with INTEGER inputs]

=> SELECT connect_by_path(supervisor_id, id, name, ' >> ') OVER () FROM company;

 identifier | depth |           path
------------+-------+--------------------------
          1 |     0 | Patrick
          2 |     1 | Patrick >> Jim
          3 |     1 | Patrick >> Sandy
          4 |     2 | Patrick >> Sandy >> Otto
(4 rows)

[Test Result with VARCHAR inputs]

=> SELECT connect_by_path(supervisor_id::VARCHAR, id::VARCHAR, name, ' >> ') OVER () FROM company;

 identifier | depth |           path
------------+-------+--------------------------
 1          |     0 | Patrick
 2          |     1 | Patrick >> Jim
 3          |     1 | Patrick >> Sandy
 4          |     2 | Patrick >> Sandy >> Otto
(4 rows)

@dmankins
Copy link
Collaborator

dmankins commented Apr 1, 2022

Thank you!

I have one reservation.

You give two factories: ConnectByIntFactory and ConnectByVarcharFactory to provide two interfaces to the same function (by reading ints as strings). This means that someone using ints will pay a conversion price to strings. In addition, all the map operations will be done on string keys and not int keys, which I suspect is less efficient.

If one's tables are small, this is probably not such a big concern, but Vertica is used with tables that have billions of rows.

Given that many (possibly most) uses of this function will be using integer keys from a dimension table, I think it might be better to have separate ConnectByInt and ConnectByVarchar implementations which only do conversions when needed.

I wonder if you could implement a template that does the algorithmic work in processPartition to avoid duplication?

Does that seem reasonable to you?

@h-serizawa
Copy link
Contributor Author

@dmankins Thank you for your valuable feedback! Yes, the data conversion from int to string will be a performance concern. I removed the conversion using the template. Can you review this request again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants