Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support indexing parray by ndarray #89

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

yinengy
Copy link
Contributor

@yinengy yinengy commented Apr 14, 2023

Also fixed a minor bug in parray.update()

@yinengy yinengy requested review from wlruys and bozhiyou April 14, 2023 05:50
@yinengy yinengy linked an issue Apr 14, 2023 that may be closed by this pull request
@yinengy yinengy self-assigned this Apr 14, 2023
@yinengy yinengy added bug Something isn't working enhancement New feature or request labels Apr 14, 2023
Copy link
Contributor

@wlruys wlruys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than efficiency lgtm,

"""
if isinstance(value, PArray):
value = value.array

if isinstance(slices, numpy.ndarray) or isinstance(slices, cupy.ndarray):
slices = slices.tolist()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this inefficient as:

  • numpy supports indexing by both list and np.ndarray of int32/int64/uint32/uint64 type
  • cupy support indexing by list, and both cp.ndarray/np.ndarray ofint32/int64/uint32/uint64 type

This could lead to a lot of device -> host copies from the crosspy calls?

Copy link
Contributor Author

@yinengy yinengy Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could only avoid by caller since this information has to be saved in cpu side that is becase it will later be converted and hashed so it could be recorded for further use within parray. (so indexing by cupy array is not efficient in any senario since parray's internal data structure is still running in cpu)

And I think crosspy should always use numpy array as index, what is the benefit of using cupy array as index? @bozhiyou

Copy link
Contributor

@bozhiyou bozhiyou Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think crosspy should always use numpy array as index

No, indices could be as large as the array itself - think of the permutation example arr[shuffle(arange(len(arr))] - and even larger. If you are indexing a cupy array, the indices array has to be copied to GPU as a cupy array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Will that list will be inefficient. Storing all the information on CPU seems to be inefficient as well.

Copy link
Contributor Author

@yinengy yinengy Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, but currently there is no efficient way to deal with this issue like simply storing the info in GPU or as ndarray (actually all lists will be converted to hash maps in the later step since there is no gurantee the slice mapping is in sequential when you deal with local indices). And each access to array need to query these information (e.g. comparing slices with given slices). Consider the following scenario:

  1. Data initialy stored in cpu then moved to gpu, but user don't know which device it is, should the user index with cp.ndarray or by np.ndarray?
  2. Same as 1 but now in a multidevice task like where crosspy is in, which slices should user use to index into it? (and which device should it be if the cupy.ndarray is used?)
  3. Which device should the cupy ndarray be stored inside parray?
    (a) If we chose to put it on the same device as where data is, when there are multiple copy on different device, each device now has part of the slicing saved, every access regardless where the data is has to access all devices and compare slice information that saved locally, that will be really slow.
    (b) if we chose to put it on the first gpu, operation happened for copy on other devices has to query this gpu to, that is the same as query data in cpu since nvlink is not guranted to be exist between all device and this request will be routed via cpu, which is slower than put it on cpu. This also raise memory issues when the slices is large (you mentioned it might be as large as array), GPU memory are more limited than cpu and runtime has to track the slice size to overwise new data moved there will be OOM, and this also cause imbalanced workload and memory size between devices.
  4. when data is moved, should we move the slices mapping together? Will this increase the data movement overhead?

until we found a good solution to the above issues, CPU is still the best choice to store the slices. (but I could make it numpy array instead of python list)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PArray does not support array type indices
3 participants