-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data movement task dependencies with clang-formatting #110
Conversation
I still have to review the bulk of the code but I'm concerned about this comment: Tasks can be spawned in any order (its bugged in old Parla: see ut-parla/Parla.py#146) and its one of the reason's that explicit dependencies are more flexible than inferred from dataflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question) This means each PArray task_list will only ever have 1 element. Is this enough to handle all parent/child relations/slicing behavior and dependencies? My gut says no, but I still have a hard time understanding the very few cases that our slices can be used concurrently and can't think of an example that breaks. I think it looks/probably is okay! but just bringing it up in case we do find some edge case.
Comment) If this is the solution, it really can't be done at spawn time. It is not thread-safe and things may be spawned out of order. It needs to be after the first implicit topological sort when things enter the mapping phase / become mappable. For example, consider spawning two subtrees with nested tasks where there are some sibling dependencies between trees.
Ok. this is the reason why I was waiting for your review :-) I forgot that case. Let me think about this. In the worst case, we may need to enforce users to specify those dependenceis, which I really don't want. |
All of the phase triggers/enqueues work as a topo sort, bc the current preconditions are that dependencies must go through the phase first. Within a phase there could potentially be a reordering (there is not currently), but when they enter and exit it is guaranteed to be a valid execution ordering. If I remember correctly, that is checked here for mappable:
|
I think the quick fix is just to delay adding tasks to parrays (and the reverse) until tasks are ordered and then use the same logic as in this original PR. There might still be a better way? but that's what jumps out to me. |
... There might have problems with policy (and mess up the triggers) depending on where/when we are adding these extra dependencies. |
I don't see any problem from your suggestion. I will rethink this problem from your suggestion. Thank you! |
One edge case is going to be continuation tasks. They are only spawned once, but hit all of the other phases more than once. There should be a counter on the task meta-data already that tracks whether its the first time or not so you can just filter them out. |
I forget the status of this? Does it need to be revisted? |
We need to fix this for NUWEST, will try my best to think of a solution to change data movement dependency semantics. |
working on it from today |
To sum up, here is the possible solution: Mapping policy actually gets more accurate information (it was your concern, wasn't it? Otherwise, I am missing something) So when we create a data move task for task 8, task 5 should be on the list. |
31b64e2
to
e17395a
Compare
de8075e
to
22d8066
Compare
I changed data move task creation and its dependency creation. Basically, instead of finding task list of a parray and compute dependency tasks, now each task sets dependency lists for each parray for dependent tasks and propagates it if a dependent task accesses the parray with read-only permission. I am not 100% for sure if this is not wrong but the bug case, cholesky, and sorting seem working correctly. |
896fe79
to
22d8066
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand this after our call. It seems like a good solution while we resolve other data semantics.
@yinengy found and reported this bug:
In this case,
ts[8]
's datamovement task runs beforets[5]
.This is because the current data move task dependency is intersection between compute task and parray's referrers.
However, in this case,
ts[8]
's dependency ists[7]
anda
's dependency ists[5]
, and so, there is no intersection.In order to fix this case, I make the parray's referrers the last task accessing it on OUT or INOUT permission.
Whenever this task is being spawned, the old referrers are cleared.
For correctness, tasks should be spawned in program execution order, but this is not new and so I think it is ok.