compatlib_function - ConnectBy supports VARCHAR inputs #70

h-serizawa · 2022-03-24T07:55:39Z

Currently, ConnectBy supports INTEGER inputs as parent ID and child ID. Now, it supports VARCHAR inputs as well. Close #69.

…ion-Packages

h-serizawa · 2022-03-25T00:41:13Z

[Test Data]

=> CREATE TABLE company (id INT, supervisor_id INT, name VARCHAR(20));
=> INSERT INTO company VALUES (1, null, 'Patrick');
=> INSERT INTO company VALUES (2, 1, 'Jim');
=> INSERT INTO company VALUES (3, 1, 'Sandy');
=> INSERT INTO company VALUES (4, 3, 'Brian');
=> INSERT INTO company VALUES (4, 3, 'Otto');
=> COMMIT;

[Test Result with INTEGER inputs]

=> SELECT connect_by_path(supervisor_id, id, name, ' >> ') OVER () FROM company;

 identifier | depth |           path
------------+-------+--------------------------
          1 |     0 | Patrick
          2 |     1 | Patrick >> Jim
          3 |     1 | Patrick >> Sandy
          4 |     2 | Patrick >> Sandy >> Otto
(4 rows)

[Test Result with VARCHAR inputs]

=> SELECT connect_by_path(supervisor_id::VARCHAR, id::VARCHAR, name, ' >> ') OVER () FROM company;

 identifier | depth |           path
------------+-------+--------------------------
 1          |     0 | Patrick
 2          |     1 | Patrick >> Jim
 3          |     1 | Patrick >> Sandy
 4          |     2 | Patrick >> Sandy >> Otto
(4 rows)

dmankins · 2022-04-01T16:39:23Z

Thank you!

I have one reservation.

You give two factories: ConnectByIntFactory and ConnectByVarcharFactory to provide two interfaces to the same function (by reading ints as strings). This means that someone using ints will pay a conversion price to strings. In addition, all the map operations will be done on string keys and not int keys, which I suspect is less efficient.

If one's tables are small, this is probably not such a big concern, but Vertica is used with tables that have billions of rows.

Given that many (possibly most) uses of this function will be using integer keys from a dimension table, I think it might be better to have separate ConnectByInt and ConnectByVarchar implementations which only do conversions when needed.

I wonder if you could implement a template that does the algorithmic work in processPartition to avoid duplication?

Does that seem reasonable to you?

h-serizawa · 2022-04-07T07:30:38Z

@dmankins Thank you for your valuable feedback! Yes, the data conversion from int to string will be a performance concern. I removed the conversion using the template. Can you review this request again?

h-serizawa added 3 commits March 24, 2022 16:42

ConnectBy supports VARCHAR inputs.

7e9396f

ConnectBy supports VARCHAR inputs.

0c16f38

Merge branch 'master' of https://github.com/h-serizawa/Vertica-Extens…

bb4620a

…ion-Packages

Solved perf concern by converting int to string

429bca0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compatlib_function - ConnectBy supports VARCHAR inputs #70

compatlib_function - ConnectBy supports VARCHAR inputs #70

h-serizawa commented Mar 24, 2022

h-serizawa commented Mar 25, 2022

dmankins commented Apr 1, 2022

h-serizawa commented Apr 7, 2022

compatlib_function - ConnectBy supports VARCHAR inputs #70

Are you sure you want to change the base?

compatlib_function - ConnectBy supports VARCHAR inputs #70

Conversation

h-serizawa commented Mar 24, 2022

h-serizawa commented Mar 25, 2022

dmankins commented Apr 1, 2022

h-serizawa commented Apr 7, 2022