Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about dimensions and update #3

Open
carstenbauer opened this issue Aug 9, 2024 · 4 comments
Open

Confusion about dimensions and update #3

carstenbauer opened this issue Aug 9, 2024 · 4 comments

Comments

@carstenbauer
Copy link
Member

carstenbauer commented Aug 9, 2024

@luraess, @JBlaschke It would be great if you guys could take a look and enlighten me so that we can fix this :)

(Disclaimer: It's early in the morning here and it might be super obvious.)

Confusion about dimensions

We call neighbors.x the neighbors in x dimension (horizontal dimension), however we copy rows (horizontal slices) of the data grid (A) into the send buffers:

# dim-1 (x)
(neighbors.x[1] != MPI.PROC_NULL) && copyto!(bufs.send_1_1, A[2 , :])
(neighbors.x[2] != MPI.PROC_NULL) && copyto!(bufs.send_1_2, A[end-1, :])

This seems wrong. If a MPI rank needs to send something to the left/right it should be a column (vertical slice), no?

See

Screenshot 2024-08-09 at 08 28 39

At least we're consistently wrong:

# dim-2 (y)
(neighbors.y[1] != MPI.PROC_NULL) && copyto!(bufs.send_2_1, A[:, 2 ])
(neighbors.y[2] != MPI.PROC_NULL) && copyto!(bufs.send_2_2, A[:, end-1])

I guess this doesn't matter - because who cares about labeling dimensions anyways in a x <-> y symmetric system - but it definitley isn't great because we actually refer to those dimensions as x and y and even have comments that say left neighbor etc.

Confusion about update

(I think I resolved this, see my new comment further down)

At the end of the dim-1 (x) communication, we copy the received data to, say, A[1, :]. However, to prepare for the dim-2 (y) communication, we copy, say, A[:, 2] into a send buffer. What confuses me: This copy only happens after the dim-1 (x) part and thus one element (A[1,2]) has already been modified as part of the dim-1 (x) receive (that is, on ranks that have both a x[1] and a y[1] neighbor).

(neighbors.x[1] != MPI.PROC_NULL) && copyto!(A[1 , :], bufs.recv_1_1)
(neighbors.x[2] != MPI.PROC_NULL) && copyto!(A[end, :], bufs.recv_1_2)
# dim-2 (y)
(neighbors.y[1] != MPI.PROC_NULL) && copyto!(bufs.send_2_1, A[:, 2 ])
(neighbors.y[2] != MPI.PROC_NULL) && copyto!(bufs.send_2_2, A[:, end-1])

That's also wrong no?! Shouldn't we have all "copy data of A into send buffers" lines (for both x and y dimension) at the top of the function to ensure that we're really sending the "old A"?

@carstenbauer
Copy link
Member Author

carstenbauer commented Aug 9, 2024

Confusion about update (resolved): Ah, I think it doesn't matter because this will be the "corner" element, which doesn't matter (it's not even shown in the image above - probably should be shown in grey).

@carstenbauer
Copy link
Member Author

carstenbauer commented Aug 9, 2024

Confusion about dimensions (proposed fix):

nprocs = 9, dims = (3, 3)
(me, coords) = (0, (0, 0))
(me, neighbors) = (0, (x = (-1, 3), y = (-1, 1)))
(me, coords) = (1, (0, 1))
(me, neighbors) = (1, (x = (-1, 4), y = (0, 2)))
(me, coords) = (2, (0, 2))
(me, neighbors) = (2, (x = (-1, 5), y = (1, -1)))
(me, coords) = (3, (1, 0))
(me, neighbors) = (3, (x = (0, 6), y = (-1, 4)))
(me, coords) = (4, (1, 1))
(me, neighbors) = (4, (x = (1, 7), y = (3, 5)))
(me, coords) = (5, (1, 2))
(me, neighbors) = (5, (x = (2, 8), y = (4, -1)))
(me, coords) = (6, (2, 0))
(me, neighbors) = (6, (x = (3, -1), y = (-1, 7)))
(me, coords) = (7, (2, 1))
(me, neighbors) = (7, (x = (4, -1), y = (6, 8)))
(me, coords) = (8, (2, 2))
(me, neighbors) = (8, (x = (5, -1), y = (7, -1)))

gives this order

0 1 2
3 4 5
6 7 8

Hence, we should probably just relabel x <-> y in the neighbors named tuple and call it a day.

@luraess
Copy link
Member

luraess commented Aug 12, 2024

In a Cartesian view of the world (which I implicitly adopted in these examples), the x-dimsension refers to columns, while the y-dimensions refers to lines. In 2D, selecting a value of x actually means copying an entire vertical slice, while selecting a value of y means copying an entire horizontal slice.

Now, confusion comes from the mapping (x, y) from the Cartesian world, to the (x, y) in the array world, where the first entry (here x) refers to the horizontal slice (lines), and the second entry (here y) refers to the vertical slice (columns).

The Cartesian -> Array "mapping" may be confusing but is overall present. If you plot a heatmap of a 2D array, it will be actually transposed (as you expect Cartesian layout, but you may get array layout). Some routines correct for that, others not.

From my point of view, there is no issue nor confusion, but it's just a matter of facts that Cartesian and array (or linear algebra) layout are "transposed"; (columns, lines) vs (lines, columns).

@luraess
Copy link
Member

luraess commented Aug 12, 2024

The corner case (literally here); the corner cell will be correctly updated if and only if MPI communication is done sequentially amongst spatial dimensions. This will solve the corner case and correctly propagate the update there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants