Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_foreach_norm: Align with PyTorch operator semantics on allocation scheme of return tensors #709

Merged
merged 18 commits into from
Aug 19, 2024

Conversation

chunhuanMeng
Copy link
Contributor

@chunhuanMeng chunhuanMeng commented Aug 7, 2024

PyTorch requires separate copies returned in foreach_norm. The existing XPU implementation follows an out-of-date allocation scheme, to share storage among returned tensor. In latest PyTorch unit test, the behavior is not allowed.
related case:

  • test_dispatch_meta_outplace__foreach_norm_xpu_bfloat16
  • test_dispatch_meta_outplace__foreach_norm_xpu_float
  • test_dispatch_symbolic_meta_outplace__foreach_norm_xpu_bfloat16
  • test_dispatch_symbolic_meta_outplace__foreach_norm_xpu_float
  • test_dispatch_symbolic_meta_outplace_all_strides__foreach_norm_xpu_float32
  • test_meta_outplace__foreach_norm_xpu_bfloat16
  • test_meta_outplace__foreach_norm_xpu_float

@fengyuan14 fengyuan14 changed the title Aten::_foreach_norm: fix storage_offset not match with meta tensor _foreach_norm: Align with PyTorch operator semantics on allocation scheme of return tensors Aug 7, 2024
@fengyuan14
Copy link
Contributor

The error is exposed by -WError in preci. Please fix it,
image

@fengyuan14 fengyuan14 mentioned this pull request Aug 13, 2024
2 tasks
@fengyuan14
Copy link
Contributor

image

@daisyden daisyden added this to the PT2.5 milestone Aug 15, 2024
ret_per_tensor.push_back(at::empty({}, res_option));
}
sycl::queue q{sycl::property::queue::in_order()};
void** tensor_list_addresses = sycl::malloc_shared<void*>((ntensors), q);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use raw runtime malloc, but helpers provided by PyTorch XPU backend. The usage here, should be,

  1. Allocating ping memory
  2. Initing meta on ping memory
  3. Allocating device memory
  4. memcpy from ping to dev

@fengyuan14 fengyuan14 added this pull request to the merge queue Aug 19, 2024
Merged via the queue into main with commit 7eb5219 Aug 19, 2024
3 checks passed
@fengyuan14 fengyuan14 deleted the meng_foreach_norm branch August 19, 2024 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants