Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .bcast() #192

Merged
merged 70 commits into from
Jun 24, 2022
Merged

Add .bcast() #192

merged 70 commits into from
Jun 24, 2022

Conversation

lukashuebner
Copy link
Member

@lukashuebner lukashuebner commented Mar 24, 2022

Closes #173

@lukashuebner lukashuebner self-assigned this Mar 24, 2022
@lukashuebner lukashuebner added feature New feature or request wip Work in progress labels Mar 24, 2022
@Hespian

This comment was marked as resolved.

@kurpicz

This comment was marked as outdated.

@lukashuebner lukashuebner force-pushed the feature-bcast branch 3 times, most recently from 0f6cd27 to 5ec8bb6 Compare March 25, 2022 11:31
Lukas Hübner added 2 commits March 25, 2022 13:17
…ing .get_ptr()

* Rework the Span<> class to be more similar with C++20's std::span.
* Add a SingleElementModifyableBuffer using the new Span<>'s interface.
* Update the existing Buffers' interface to use the new Span<>
* Split the existing Buffers' .get_ptr() into .resize() and .data()
* Adjust allreduce and alltoall to use the new Span<> and Buffers
@lukashuebner lukashuebner added this to the Proof of Concept v0.1.0 milestone Mar 25, 2022
@lukashuebner lukashuebner linked an issue Mar 25, 2022 that may be closed by this pull request
@lukashuebner
Copy link
Member Author

lukashuebner commented Mar 25, 2022

Okay, I've split up this PR into #199 and this, which is forked from #199. How can I mark this PR as dependent on #199 and stop github from displaying the whole change set here?

@Hespian
Copy link
Member

Hespian commented Mar 25, 2022

I don't think github has a feature for that. You just leave the dependent-tag on this PR until #199 is merged. Then you merge main into feature-bcast. That should result in the parts from #199 being removed from the diff of this PR. Then you remove the dependent-tag and add a review-tag.

@lukashuebner lukashuebner added review Ready for review and removed wip Work in progress labels Mar 25, 2022
Comment on lines 77 to 79
KASSERT(
recv_buf_large_enough_on_all_processes(send_recv_buf, root.rank()),
"The receive buffer is too small on at least one rank.", assert::light_communication);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want an execution path where kamping resizes the buffer so that it is large enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the behavior be in the following cases:

  • send_recv_buf.size() == 1 on root and == 0 on all other ranks, send_counts not specified: Assume a default size of 1? Broadcast the size? -> doubling the number of broadcasts might be unexpected here; depend on if the ominous SingleElementBuffer is specified?
  • send_counts explicitly specified: resize the buffer on all but the root process? On all including the root process? What, if the send_recv_buf() on root is const?
  • send_counts not specified and not case 1: Broadcast the size of the send buffer, resize the receive buffers, broadcast the data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @DanielSeemaier does something like this in #170. Bcast and Scatter should probably use the same mechanisms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postponed, see #170 .

We could do that in a future PR when we see what's best (in particular this might change with our ongoing discussion on reducing syntax overhead anyways)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I still don't really like the way it's done here. We never expected the size of the receive buffers to be set by the user and I would like to keep it that way.

Copy link
Member

@Hespian Hespian May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would force the user to write different code for root and non-root ranks. Probably not ideal

What would you think about send_recv_buf being optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Year, I don't like splitting up the send_recv_buf into a send_buf and recv_buf, too. But iirc someone argued sternly for it, as it'll unify the interface.

Copy link
Member Author

@lukashuebner lukashuebner May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this then? :-)

if I'm the root:
    assert, that I have a send_recv_buf
    if I have no recv_count specified:
       // assume, that the other ranks don't know how much data to receive
       broadcast the amount of data to receive
    else:
        // assume that all other ranks also know about their recv_count
        Check, that the recv_counts are the same on all ranks
    broadcast the data
else: // if I'm not the root:
    if I do not have a send_recv_buf, allocate one
    if I have a recv_count:
        resize my send_recv_buf
        assert, that all ranks have the same recv_count (eq. to the one provided to the root)
    else:
        receive the broadcast of the amount of data to receive
        resize my send_recv_buf
    receive the broadcast of the data

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that resizing the send_recv_buf should happen after receiving the recv_count, I think that should work fine. Not sure I remember the part of the discussion about a unified interface. @kurpicz ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I looked at the old protocols and we already voted in favor of "(Bcast) SendRecv-Buffer auf allen ranks oder wenn SendRecv-Buffer nicht gegeben ist, ist man kein Root und man bekommt das Recv Result zurück."

tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
@lukashuebner lukashuebner changed the title Add .bcast() and single element modifyable buffers. Add .bcast() Mar 31, 2022
@lukashuebner
Copy link
Member Author

On hold until we decided if we want all ranks to know about failed assertions or only some.

@lukashuebner lukashuebner requested a review from Hespian June 21, 2022 13:33
Copy link
Member

@Hespian Hespian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few comments but I think they're all pretty minor.

include/kamping/collectives/bcast.hpp Show resolved Hide resolved
include/kamping/collectives/bcast.hpp Show resolved Hide resolved
include/kamping/collectives/bcast.hpp Outdated Show resolved Hide resolved
include/kamping/collectives/bcast.hpp Show resolved Hide resolved
include/kamping/collectives/bcast.hpp Outdated Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
include/kamping/collectives/bcast.hpp Outdated Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
std::iota(values.begin(), values.end(), comm.rank() * 10);
kamping::Span<int> transfer_view(values.data(), asserting_cast<size_t>(num_transferred_values));

/// @todo Once we can check assertions, check that providing an recv_count != transfer_view.size() fails.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added tests for this elsewhere, so this comment can be removed

tests/collectives/mpi_bcast_test.cpp Show resolved Hide resolved
include/kamping/collectives/bcast.hpp Outdated Show resolved Hide resolved
@lukashuebner lukashuebner requested a review from Hespian June 23, 2022 08:51
Copy link
Member

@Hespian Hespian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably uncomment the assertion tests in #329

Copy link
Member

@Hespian Hespian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanielSeemaier was quicker merging #329 ;) Hopefully just uncommenting your tests makes them pass ;)

tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
@lukashuebner
Copy link
Member Author

@DanielSeemaier was quicker merging #329 ;) Hopefully just uncommenting your tests makes them pass ;)

Nope ... my guess is, that his approach does not work when only some ranks fail the KASSERT.

@lukashuebner lukashuebner requested a review from Hespian June 24, 2022 06:33
Copy link
Member

@Hespian Hespian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one KASSERT test that I don't understand why it should fail.

Other than that, saying EXPECT_KASSERT_FAILS is broken (needs a fix) seems a bit aggressive. You want an added feature ;)

tests/collectives/mpi_bcast_test.cpp Show resolved Hide resolved
tests/collectives/mpi_bcast_test.cpp Outdated Show resolved Hide resolved
@lukashuebner lukashuebner merged commit 4aace80 into main Jun 24, 2022
@lukashuebner lukashuebner deleted the feature-bcast branch June 24, 2022 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request review Ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MPI_Bcast
4 participants