Affinity: Update resource lifetime #70

mhoemmen · 2018-08-06T00:01:50Z

Update resource lifetime discussion in the Affinity proposal, based on #67 discussion.

NOTE: I have not changed the rest of the proposal to make it consistent with these changes. If people like this pull request, I will then update the rest of the proposal accordingly.

Update resource lifetime discussion in the Affinity proposal, based on codeplaysoftware#67 discussion. NOTE: I have not changed the rest of the proposal to make it consistent with these changes. If people like this pull request, I will then update the rest of the proposal accordingly.

jeffhammond · 2018-08-06T14:12:25Z

We might expect this to be fast, but since valarray containers are initialized automatically and automatically allocated on the master thread's memory, we find that it is actually quite slow even when we have more than one thread.

This is only true if threads touch pages from more than one NUMA domain. Such a code will perform just fine on IBM Blue Gene/Q or Intel Core i7 processors.

mhoemmen · 2018-08-06T15:25:07Z

@jeffhammond I forget who wrote that section -- it would help to have some more details on how they ran the experiment. We can always rerun it at some point. My main concern for now is deciding on the lifetime of these resource thingies :D

AerialMantis

This looks great, this will provide a good background for the later wording on the lifetimes and it's revealed some very interesting questions. Thanks.

affinity/cpp-20/d0796r2.md

AerialMantis · 2018-08-20T13:19:41Z

affinity/cpp-20/d0796r2.md

-The initial solution should target systems with a single addressable memory region. It should thus exclude devices like discrete GPUs. In order to maintain a unified interface going forward, the initial solution should consider these devices and be able to scale to support them in the future. In particular, in order to support heterogeneous systems, the abstraction must let the interface query the *resource topology* of the *system* in order to perform device discovery.
+Creation of a *context* expresses intent to use the *resource*, not just to view it as part of the *resource topology*. Thus, if a *resource* could ever cease to point to a valid underlying resource, then users must not be allowed to create a *context* from the resource instance, or launch parallel executions with that context. *Context* construction, and use of an *executor* with that *context* to launch a parallel execution, both assert validity of the *context*'s *resource*.
+
+If a *resource* is valid, then it must always point to the same underlying thing. For example, a *resource* cannot first point to one CPU core, and then suddenly point to a different CPU core. *Contexts* can thus rely on properties like binding of operating system threads to CPU cores. However, the "thing" to which a *resource* points may be a dynamic, possibly software-managed pool of hardware. For instance, the operating system may maintain a pool of a varying number of CPU cores, or a device driver may switch between a "discrete" GPU and an "integrated" GPU, depending on utilization and power constraints.


I'm not sure if I understand the part about the thing that is pointed to being dynamic. Does this mean that the thing that an execution_resource points to may become available/unavailable dynamically, not that it can become something different?

Thanks so much for your feedback! I'll answer a few questions at a time since I'm in meetings all day.

What I mean by "the thing that is pointed to," is that an execution resource may point either to a hardware resource (CPU core, etc.) or to a software resource. Tom Rodgers from Red Hat pointed out today the use case of running in a virtual machine, in which "hardware" doesn't necessarily map to real hardware.

In that case of running in a virtual machine, the "CPU core" to which an execution resource points, might run on one physical CPU core at one moment. At the next, it may run on a different CPU core, even possibly on a different node in a different place. Nevertheless, we want the execution resource to point to that same "virtualized CPU core"; the resource shouldn't suddenly point to something else like a GPU or whatever.

Thanks for your explanation, I think I understand now, so effectively we want to say that a C++ library implementation cannot change what an execution resource points to dynamically, however, what that execution resource points to can be dynamically managed by the environment which is executing the process, such as the OS or a virtual machine?

That's right; thanks! :D

(Note to self: Added virtual machine example locally)

AerialMantis · 2018-08-20T14:00:19Z

affinity/cpp-20/d0796r2.md

+
+Questions:
+
+  1. What should happen if users are traversing the *resource topology*, but never use the accelerator's *resource* (other than to iterate past it), and something else concurrently calls `Aleph_finalize()`?


You've raised some interesting questions here.

I think the best way to solve this would be to require that this_system::get_resource be thread-safe and initialise and finalize any third party or OS resources when constructing the topology before returning. This would mean that there's no way to concurrently interfere with how the implementation uses those resources without using them directly through their respective API, though that would have to be undefined behaviour. This also means that the topology would be a snapshot of the resources at that point, and don't actually require any of the underlying resources to be alive until they are actually used, i.e. when passed to the construction of a context.

The disadvantage of this approach is that the initial call to this_system::get_resource could be quite costly, depending on the implementation, though I'd expect that you would only do this once on startup or at specific intervals between computations.

Alternatively, if we go down the route that was suggested at Rapperswil of having a visitor approach to topology discovery, where the user provides some kind of visitor that traverses the topology and returns the resources that it wants to use. With this kind of approach, it might be worth having the execution_resource have active resources. Perhaps this pattern could construct the context directly?

It may also be worth having a thread-safe valid member function on the execution_resource to check if the underlying resource is still available. Though I think this would have to be named differently to represent that it is not simply a getter and that it must do some querying or initialising of resources within the topology to identify if the resource is still valid. However, even with this, you could still have the situation where a resource becomes unavailable between calling valid and constructing an execution context or after the context is constructed.

Perhaps instead we should say that you can always construct a context from an execution_resource, even if it's unavailable, but have the context handle the case where the resource is unavailable, by cancelling enqueued work and disallowing new submissions and throwing an exception or calling a callback function.

This also means that the topology would be a snapshot of the resources at that point, and don't actually require any of the underlying resources to be alive until they are actually used, i.e. when passed to the construction of a context.

I like this. This could even solve the problem of some resources only being available inside a parallel region (like a GPU). Topology is a snapshot of everything reachable from the root.

This would mean that there's no way to concurrently interfere with how the implementation uses those resources without using them directly through their respective API, though that would have to be undefined behaviour.

I'm OK with that being UB in general, possibly with specific APIs giving stronger guarantees.

though I'd expect that you would only do this once on startup or at specific intervals between computations.

That's right -- this is really about constructing a thread pool. Crazy high-frequency dynamic hardware load balancing feels like it wants a different interface. (Users might just see that as a special kind of resource -- it doesn't feel like something that naturally has an easily walkable topology.)

Alternatively, if we go down the route that was suggested at Rapperswil of having a visitor approach to topology discovery, where the user provides some kind of visitor that traverses the topology and returns the resources that it wants to use.

Was that in the notes? I'd like to learn more about that. (Sorry I missed that discussion at Rapperswil.) I'm not sure, without knowing more at least, whether that would solve the problems we've discussed here.

Perhaps instead we should say that you can always construct a context from an execution_resource, even if it's unavailable, but have the context handle the case where the resource is unavailable, by cancelling enqueued work and disallowing new submissions and throwing an exception or calling a callback function.

Why not just attempt to create the context? If the resource is no longer available, the context creation fails. Otherwise, the context assumes the responsibility for keeping the resource alive or otherwise handling the case where the resource ceases to be alive at some point.

@AerialMantis I reviewed the Rapperswil notes, and noticed that the affinity paper was marked "DND." It looks like it wasn't covered there. Were you referring to the P0443 discussion in SG1 (and/or joint with LEWG)? Here is the link (requires usual login). For example: "JB: My takeaway is that (P0443) executors are copyable things with reference semantics."

Note: Discussion a week or two ago concluded that the affinity paper was actually discussed in SG1 in Rapperswil, but didn't make it into the minutes.

(Note to self: I think I've addressed this in my local changes.)

AerialMantis · 2018-09-03T13:53:07Z

I like this. This could even solve the problem of some resources only being available inside a parallel region (like a GPU). Topology is a snapshot of everything reachable from the root.

Yes, I think that could be a nice way to represent that. We could say that when you perform topology discovery you returned a topology structure of everything which can be executed from in the current thread of execution. So something like GPU device-side enqueue could be represented by an entirely different execution resource/execution context which can be discovered and constructed within a GPU kernel.

We continued discussing this during the last call and we decided that we could potentially support the idea of execution resources only being available within a particular parallel region, by having this_thread::resource() also be permitted to perform some topology discovery and return resources that were not previously available.

AerialMantis · 2018-09-03T13:59:55Z

Was that in the notes? I'd like to learn more about that. (Sorry I missed that discussion at Rapperswil.) I'm not sure, without knowing more at least, whether that would solve the problems we've discussed here.

This was mostly just a brain dump, I've not put a great deal of thought into that approach yet, so I'm not sure if that would really help us hear much, thinking about it further I'm tempted to agree that it likely won't. No problem, I thought there were notes taken at the Rapperswil meeting, perhaps not, I'll have a look for them.

This is what I had in the notes that I took:

It was suggested that we could observe the topology through some kind of visitor pattern (in order of execution or dispatch).

Essentially the suggestion was that rather than having users have to traverse the topology manually we could provide some kind of visitor interface which allows users to specify what they want to find and how they want to represent it and have the implementation do everything for you. We have a similar concept in SYCL, though this is for a smaller domain that we are aiming to support here and I can see some potential difficulties in making this generic enough to be useful. Though it could be a useful higher-level interface for those who don't want to do things manually.

AerialMantis · 2018-09-03T14:02:28Z

Why not just attempt to create the context? If the resource is no longer available, the context creation fails. Otherwise, the context assumes the responsibility for keeping the resource alive or otherwise handling the case where the resource ceases to be alive at some point.

Yeah, that makes sense to me.

mhoemmen · 2018-09-03T14:18:43Z

TODO need to make topology discovery race free, with respect to different (CPU) threads.

…scussion

CP013: Address #67 & discussion in #70

mhoemmen mentioned this pull request Aug 6, 2018

CP013: Calrify the requirements for the copy and move semantics of the execution_resource #67

Open

AerialMantis reviewed Aug 20, 2018

View reviewed changes

AerialMantis mentioned this pull request Sep 3, 2018

Add Terminology section #47

Open

mhoemmen pushed a commit to mhoemmen/standards-proposals that referenced this pull request Sep 10, 2018

Affinity: Recreate PR codeplaysoftware#70, & add virtual machine example

b1164be

mhoemmen pushed a commit to mhoemmen/standards-proposals that referenced this pull request Sep 10, 2018

Affinity: Add changelog entries relating to PR codeplaysoftware#70 di…

36df52f

…scussion

mhoemmen pushed a commit to mhoemmen/standards-proposals that referenced this pull request Sep 10, 2018

Affinity: Include more feedback from PR codeplaysoftware#70

ca1564a

mhoemmen mentioned this pull request Sep 10, 2018

Affinity: Address #67 & discussion in #70 #74

Merged

AerialMantis added a commit that referenced this pull request Oct 2, 2018

Merge pull request #74 from mhoemmen/Affinity

926634e

CP013: Address #67 & discussion in #70


		Questions:

		1. What should happen if users are traversing the resource topology, but never use the accelerator's resource (other than to iterate past it), and something else concurrently calls `Aleph_finalize()`?

Affinity: Update resource lifetime #70

Are you sure you want to change the base?

Affinity: Update resource lifetime #70

Uh oh!

Conversation

mhoemmen commented Aug 6, 2018

Uh oh!

jeffhammond commented Aug 6, 2018

Uh oh!

mhoemmen commented Aug 6, 2018

Uh oh!

AerialMantis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhoemmen Aug 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AerialMantis commented Sep 3, 2018

Uh oh!

AerialMantis commented Sep 3, 2018

Uh oh!

AerialMantis commented Sep 3, 2018

Uh oh!

mhoemmen commented Sep 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mhoemmen Aug 21, 2018 •

edited

Loading

mhoemmen commented Sep 3, 2018 •

edited

Loading