Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu: misc fixes #5176

Merged
merged 4 commits into from
Mar 29, 2021
Merged

gpu: misc fixes #5176

merged 4 commits into from
Mar 29, 2021

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Mar 26, 2021

Pull Request Description

Some fixes that breaks the GPU path.

It is tested in #5000.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou hzhou requested a review from raffenet March 26, 2021 21:49
@hzhou
Copy link
Contributor Author

hzhou commented Mar 26, 2021

test:mpich/ch4/most

@hzhou
Copy link
Contributor Author

hzhou commented Mar 27, 2021

test:mpich/ch4/ofi

MPI_Aint extent, true_lb, true_extent;
MPIR_Datatype_get_extent_macro(datatype, extent);
MPIR_Type_get_true_extent_impl(datatype, &true_lb, &true_extent);
// extent = MPL_MAX(extent, true_extent);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat surprised this works without this line. In any case, if it's not needed, we should remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant to remove it but have forgotten. The data is packed by extent not true_extent, so I believe take true_extent * count is wrong or at least imprecise. true_extent should only be used to adjust the final boundary.

Comment on lines 25 to 36
MPI_Aint size = extent * count - extent + true_extent;
if (true_lb > 0) {
size += true_lb;
} else if (true_lb < 0) {
size += (-true_lb);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I didn't look closely at these manipulations down here. Now I see where the size is adjusted for true_extent.

Copy link
Contributor Author

@hzhou hzhou Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, the adjustment based on true_lb is not necessary -- I meant to delete them. The new code returns a pointer shifted by the true_lb so the exact size should work now.

@hzhou hzhou force-pushed the 2103_gpu_fix branch 2 times, most recently from ddc82bb to ab9af0c Compare March 29, 2021 16:05
MPIR_Assert(host_buf);

host_buf = (char *) host_buf - shift;
return host_buf;
Copy link
Contributor Author

@hzhou hzhou Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raffenet Thought about it a bit more. The previous fix didn't account for the case when extent is negative, which will put the extra count data before the first data, thus we may need to allocate extra size to account for.

Could you review this commit again? Meanwhile, I'll rebase and test it in PR #5000 -- although I don't think DTPool will generate any datatype with negative extent.

@hzhou
Copy link
Contributor Author

hzhou commented Mar 29, 2021

test:mpich/ch4/ofi
✔️

hzhou added 4 commits March 29, 2021 13:26
We can't directly rdma into a gpu buffer yet, force it to go through
unpacking.
Provide higher level common utilities. For one, the code is more
readable with higher-level names focusing on the semantics rather than
details. For two, it allows easier modification for alternate mechanism.
The old allocates host buffer without consider potential non-zero lower bound,
resulting out-of-bound memory overwriting. Use the higher level wrappers
provided in the last commit to fix the issue and provider cleaner
semantics.
@hzhou hzhou requested a review from raffenet March 29, 2021 18:27
void *host_buf = MPL_malloc(size, MPL_MEM_OTHER);
MPIR_Assert(host_buf);

host_buf = (char *) host_buf - shift;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So true_lb is always <= 0 when count == 1 or extent >= 0? Am I reading this right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. true_lb works whether it's positive or negative, it is an adjustment from the buffer pointer and the arithmetic will work for either positive or negative. However, the true_lb returned from MPIR_Type_get_true_extent_impl is the true_lb for a single count datatype. When extent is positive, the true_lb remain the same for multiple count. so no adjustment is needed. However, when extent is negative, the true_lb with multiple count will be further extended into the negative direction, thus requires further adjustment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm trying to understand is we are returning an allocated buffer, but with the address potentially shifted outside of the malloc'd region in the case of positive true_lb. Wouldn't writing to or dereferencing that address lead to a potential crash? How do we guard against these scenarios? Is it because MPIR_Localcopy will only be used with a datatype that would not do this?

Copy link
Contributor Author

@hzhou hzhou Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't writing to or dereferencing that address lead to a potential crash?

The dataype packing/unpacking routine should only access the actual data location specified by the MPI datatype. Only a bug will result in a wrong access.

PS: think about MPI_BOTTOM, it is okay as long as the datatype is correctly constructed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, only accessing this with a datatype-aware copy routine makes sense. I think it should be OK. I'd still like to see it go thru Jenkins again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code has already passed Jenkins both here #5176 (comment) and in the gpu testing #5000 (comment) . The last push is only a rebase with PR #5005

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my previous comments, I am going to go ahead merge this, then get #5000 rebased. Any issues should pop up in that PR anyway.

@hzhou hzhou merged commit 4c1b448 into pmodels:main Mar 29, 2021
@hzhou hzhou deleted the 2103_gpu_fix branch March 29, 2021 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants