Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Affinity: Update resource lifetime #70

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mhoemmen
Copy link
Collaborator

@mhoemmen mhoemmen commented Aug 6, 2018

Update resource lifetime discussion in the Affinity proposal, based on #67 discussion.

NOTE: I have not changed the rest of the proposal to make it consistent with these changes. If people like this pull request, I will then update the rest of the proposal accordingly.

Update resource lifetime discussion in the Affinity proposal, based on codeplaysoftware#67 discussion.

NOTE: I have not changed the rest of the proposal to make it consistent with these changes. If people like this pull request, I will then update the rest of the proposal accordingly.
@jeffhammond
Copy link

We might expect this to be fast, but since valarray containers are initialized automatically and automatically allocated on the master thread's memory, we find that it is actually quite slow even when we have more than one thread.

This is only true if threads touch pages from more than one NUMA domain. Such a code will perform just fine on IBM Blue Gene/Q or Intel Core i7 processors.

@mhoemmen
Copy link
Collaborator Author

mhoemmen commented Aug 6, 2018

@jeffhammond I forget who wrote that section -- it would help to have some more details on how they ran the experiment. We can always rerun it at some point. My main concern for now is deciding on the lifetime of these resource thingies :D

Copy link
Contributor

@AerialMantis AerialMantis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, this will provide a good background for the later wording on the lifetimes and it's revealed some very interesting questions. Thanks.

affinity/cpp-20/d0796r2.md Show resolved Hide resolved
affinity/cpp-20/d0796r2.md Show resolved Hide resolved
The initial solution should target systems with a single addressable memory region. It should thus exclude devices like discrete GPUs. In order to maintain a unified interface going forward, the initial solution should consider these devices and be able to scale to support them in the future. In particular, in order to support heterogeneous systems, the abstraction must let the interface query the *resource topology* of the *system* in order to perform device discovery.
Creation of a *context* expresses intent to use the *resource*, not just to view it as part of the *resource topology*. Thus, if a *resource* could ever cease to point to a valid underlying resource, then users must not be allowed to create a *context* from the resource instance, or launch parallel executions with that context. *Context* construction, and use of an *executor* with that *context* to launch a parallel execution, both assert validity of the *context*'s *resource*.

If a *resource* is valid, then it must always point to the same underlying thing. For example, a *resource* cannot first point to one CPU core, and then suddenly point to a different CPU core. *Contexts* can thus rely on properties like binding of operating system threads to CPU cores. However, the "thing" to which a *resource* points may be a dynamic, possibly software-managed pool of hardware. For instance, the operating system may maintain a pool of a varying number of CPU cores, or a device driver may switch between a "discrete" GPU and an "integrated" GPU, depending on utilization and power constraints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand the part about the thing that is pointed to being dynamic. Does this mean that the thing that an execution_resource points to may become available/unavailable dynamically, not that it can become something different?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for your feedback! I'll answer a few questions at a time since I'm in meetings all day.

Copy link
Collaborator Author

@mhoemmen mhoemmen Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean by "the thing that is pointed to," is that an execution resource may point either to a hardware resource (CPU core, etc.) or to a software resource. Tom Rodgers from Red Hat pointed out today the use case of running in a virtual machine, in which "hardware" doesn't necessarily map to real hardware.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case of running in a virtual machine, the "CPU core" to which an execution resource points, might run on one physical CPU core at one moment. At the next, it may run on a different CPU core, even possibly on a different node in a different place. Nevertheless, we want the execution resource to point to that same "virtualized CPU core"; the resource shouldn't suddenly point to something else like a GPU or whatever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanation, I think I understand now, so effectively we want to say that a C++ library implementation cannot change what an execution resource points to dynamically, however, what that execution resource points to can be dynamically managed by the environment which is executing the process, such as the OS or a virtual machine?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right; thanks! :D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note to self: Added virtual machine example locally)


Questions:

1. What should happen if users are traversing the *resource topology*, but never use the accelerator's *resource* (other than to iterate past it), and something else concurrently calls `Aleph_finalize()`?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've raised some interesting questions here.

I think the best way to solve this would be to require that this_system::get_resource be thread-safe and initialise and finalize any third party or OS resources when constructing the topology before returning. This would mean that there's no way to concurrently interfere with how the implementation uses those resources without using them directly through their respective API, though that would have to be undefined behaviour. This also means that the topology would be a snapshot of the resources at that point, and don't actually require any of the underlying resources to be alive until they are actually used, i.e. when passed to the construction of a context.

The disadvantage of this approach is that the initial call to this_system::get_resource could be quite costly, depending on the implementation, though I'd expect that you would only do this once on startup or at specific intervals between computations.

Alternatively, if we go down the route that was suggested at Rapperswil of having a visitor approach to topology discovery, where the user provides some kind of visitor that traverses the topology and returns the resources that it wants to use. With this kind of approach, it might be worth having the execution_resource have active resources. Perhaps this pattern could construct the context directly?

It may also be worth having a thread-safe valid member function on the execution_resource to check if the underlying resource is still available. Though I think this would have to be named differently to represent that it is not simply a getter and that it must do some querying or initialising of resources within the topology to identify if the resource is still valid. However, even with this, you could still have the situation where a resource becomes unavailable between calling valid and constructing an execution context or after the context is constructed.

Perhaps instead we should say that you can always construct a context from an execution_resource, even if it's unavailable, but have the context handle the case where the resource is unavailable, by cancelling enqueued work and disallowing new submissions and throwing an exception or calling a callback function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means that the topology would be a snapshot of the resources at that point, and don't actually require any of the underlying resources to be alive until they are actually used, i.e. when passed to the construction of a context.

I like this. This could even solve the problem of some resources only being available inside a parallel region (like a GPU). Topology is a snapshot of everything reachable from the root.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean that there's no way to concurrently interfere with how the implementation uses those resources without using them directly through their respective API, though that would have to be undefined behaviour.

I'm OK with that being UB in general, possibly with specific APIs giving stronger guarantees.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I'd expect that you would only do this once on startup or at specific intervals between computations.

That's right -- this is really about constructing a thread pool. Crazy high-frequency dynamic hardware load balancing feels like it wants a different interface. (Users might just see that as a special kind of resource -- it doesn't feel like something that naturally has an easily walkable topology.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if we go down the route that was suggested at Rapperswil of having a visitor approach to topology discovery, where the user provides some kind of visitor that traverses the topology and returns the resources that it wants to use.

Was that in the notes? I'd like to learn more about that. (Sorry I missed that discussion at Rapperswil.) I'm not sure, without knowing more at least, whether that would solve the problems we've discussed here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps instead we should say that you can always construct a context from an execution_resource, even if it's unavailable, but have the context handle the case where the resource is unavailable, by cancelling enqueued work and disallowing new submissions and throwing an exception or calling a callback function.

Why not just attempt to create the context? If the resource is no longer available, the context creation fails. Otherwise, the context assumes the responsibility for keeping the resource alive or otherwise handling the case where the resource ceases to be alive at some point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AerialMantis I reviewed the Rapperswil notes, and noticed that the affinity paper was marked "DND." It looks like it wasn't covered there. Were you referring to the P0443 discussion in SG1 (and/or joint with LEWG)? Here is the link (requires usual login). For example: "JB: My takeaway is that (P0443) executors are copyable things with reference semantics."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Discussion a week or two ago concluded that the affinity paper was actually discussed in SG1 in Rapperswil, but didn't make it into the minutes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note to self: I think I've addressed this in my local changes.)

@AerialMantis
Copy link
Contributor

I like this. This could even solve the problem of some resources only being available inside a parallel region (like a GPU). Topology is a snapshot of everything reachable from the root.

Yes, I think that could be a nice way to represent that. We could say that when you perform topology discovery you returned a topology structure of everything which can be executed from in the current thread of execution. So something like GPU device-side enqueue could be represented by an entirely different execution resource/execution context which can be discovered and constructed within a GPU kernel.

We continued discussing this during the last call and we decided that we could potentially support the idea of execution resources only being available within a particular parallel region, by having this_thread::resource() also be permitted to perform some topology discovery and return resources that were not previously available.

@AerialMantis
Copy link
Contributor

Was that in the notes? I'd like to learn more about that. (Sorry I missed that discussion at Rapperswil.) I'm not sure, without knowing more at least, whether that would solve the problems we've discussed here.

This was mostly just a brain dump, I've not put a great deal of thought into that approach yet, so I'm not sure if that would really help us hear much, thinking about it further I'm tempted to agree that it likely won't. No problem, I thought there were notes taken at the Rapperswil meeting, perhaps not, I'll have a look for them.

This is what I had in the notes that I took:

  • It was suggested that we could observe the topology through some kind of visitor pattern (in order of execution or dispatch).

Essentially the suggestion was that rather than having users have to traverse the topology manually we could provide some kind of visitor interface which allows users to specify what they want to find and how they want to represent it and have the implementation do everything for you. We have a similar concept in SYCL, though this is for a smaller domain that we are aiming to support here and I can see some potential difficulties in making this generic enough to be useful. Though it could be a useful higher-level interface for those who don't want to do things manually.

@AerialMantis
Copy link
Contributor

Why not just attempt to create the context? If the resource is no longer available, the context creation fails. Otherwise, the context assumes the responsibility for keeping the resource alive or otherwise handling the case where the resource ceases to be alive at some point.

Yeah, that makes sense to me.

@mhoemmen
Copy link
Collaborator Author

mhoemmen commented Sep 3, 2018

TODO need to make topology discovery race free, with respect to different (CPU) threads.

mhoemmen pushed a commit to mhoemmen/standards-proposals that referenced this pull request Sep 10, 2018
mhoemmen pushed a commit to mhoemmen/standards-proposals that referenced this pull request Sep 10, 2018
mhoemmen pushed a commit to mhoemmen/standards-proposals that referenced this pull request Sep 10, 2018
AerialMantis added a commit that referenced this pull request Oct 2, 2018
CP013: Address #67 & discussion in #70
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants