You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I thought real good about it and it is not possible to mapping tasks be
multi-processed/constant memory, because every slice must know the entire right table dictionary
and anything that is not the first slice must also have all previous other slice indices.
Best we can do is reduce the RAM usage via using hash of a string or reduce memory usage via native implementation.
"""
Please correct me if I misunderstand, but as far as I see it, it is not necessary to hold the entire right table dictionary in memory.
In the process illustrated below, there are 5 tasks:
Task 1: create sparse index for a slice of LEFT and slice of RIGHT. The example shows 8 tasks that can be executed concurrently (if memory permits).
Task 2: concatenate the output of the 8 x task 1. This is a singleton and cannot run in parallel.
Task 3: [optional] sort the result from Task 2.
Task 4: construct the reindex table based on join type:
Inner = sparse table.
Left = left table range + duplicates in the sparse table.
Outer = left table range repeated by right table range.
This process is single core.
Task 5: for each column in new a task is created for a page sized slice of either the left or right column in reindex.
to construct the new page, each process will first have to gather the required rows by performing read of multiple slices. In the example below REINDEX RIGHT illustrates this with blue and orange background colour reflecting respectively 1st and 2nd page of the table RIGHT.
PS> If the key is a table crossing key or any column contains strings, it makes sense to represent the key as a cryptographic hash.
The text was updated successfully, but these errors were encountered:
@realratchet : In commit: aee872b in line 452:
"""
Ratchet:
Please correct me if I misunderstand, but as far as I see it, it is not necessary to hold the entire right table dictionary in memory.
In the process illustrated below, there are 5 tasks:
Task 1: create sparse index for a slice of LEFT and slice of RIGHT. The example shows 8 tasks that can be executed concurrently (if memory permits).
Task 2: concatenate the output of the 8 x task 1. This is a singleton and cannot run in parallel.
Task 3: [optional] sort the result from Task 2.
Task 4: construct the
reindex
table based on join type:This process is single core.
Task 5: for each column in new a task is created for a page sized slice of either the left or right column in reindex.
PS> If the key is a table crossing key or any column contains
strings
, it makes sense to represent the key as a cryptographic hash.The text was updated successfully, but these errors were encountered: