Confusion about dimensions and update #3

carstenbauer · 2024-08-09T06:27:56Z

@luraess, @JBlaschke It would be great if you guys could take a look and enlighten me so that we can fix this :)

(Disclaimer: It's early in the morning here and it might be super obvious.)

Confusion about dimensions

We call neighbors.x the neighbors in x dimension (horizontal dimension), however we copy rows (horizontal slices) of the data grid (A) into the send buffers:

juliacon24-hpcworkshop/parts/mpi/solution/diffusion_2d_mpi.jl

Lines 24 to 26 in b1e77e6

    
           # dim-1 (x) 
        
           (neighbors.x[1] != MPI.PROC_NULL) && copyto!(bufs.send_1_1, A[2    , :]) 
        
           (neighbors.x[2] != MPI.PROC_NULL) && copyto!(bufs.send_1_2, A[end-1, :])

This seems wrong. If a MPI rank needs to send something to the left/right it should be a column (vertical slice), no?

See

At least we're consistently wrong:

juliacon24-hpcworkshop/parts/mpi/solution/diffusion_2d_mpi.jl

Lines 39 to 41 in b1e77e6

    
           # dim-2 (y) 
        
           (neighbors.y[1] != MPI.PROC_NULL) && copyto!(bufs.send_2_1, A[:, 2    ]) 
        
           (neighbors.y[2] != MPI.PROC_NULL) && copyto!(bufs.send_2_2, A[:, end-1])

I guess this doesn't matter - because who cares about labeling dimensions anyways in a x <-> y symmetric system - but it definitley isn't great because we actually refer to those dimensions as x and y and even have comments that say left neighbor etc.

Confusion about update

(I think I resolved this, see my new comment further down)

At the end of the dim-1 (x) communication, we copy the received data to, say, A[1, :]. However, to prepare for the dim-2 (y) communication, we copy, say, A[:, 2] into a send buffer. What confuses me: This copy only happens after the dim-1 (x) part and thus one element (A[1,2]) has already been modified as part of the dim-1 (x) receive (that is, on ranks that have both a x[1] and a y[1] neighbor).

juliacon24-hpcworkshop/parts/mpi/solution/diffusion_2d_mpi.jl

Lines 36 to 41 in b1e77e6

    
           (neighbors.x[1] != MPI.PROC_NULL) && copyto!(A[1  , :], bufs.recv_1_1) 
        
           (neighbors.x[2] != MPI.PROC_NULL) && copyto!(A[end, :], bufs.recv_1_2) 
        
           # dim-2 (y) 
        
           (neighbors.y[1] != MPI.PROC_NULL) && copyto!(bufs.send_2_1, A[:, 2    ]) 
        
           (neighbors.y[2] != MPI.PROC_NULL) && copyto!(bufs.send_2_2, A[:, end-1])

That's also wrong no?! Shouldn't we have all "copy data of A into send buffers" lines (for both x and y dimension) at the top of the function to ensure that we're really sending the "old A"?

The text was updated successfully, but these errors were encountered:

carstenbauer · 2024-08-09T06:43:49Z

Confusion about update (resolved): Ah, I think it doesn't matter because this will be the "corner" element, which doesn't matter (it's not even shown in the image above - probably should be shown in grey).

carstenbauer · 2024-08-09T07:03:34Z

Confusion about dimensions (proposed fix):

nprocs = 9, dims = (3, 3)
(me, coords) = (0, (0, 0))
(me, neighbors) = (0, (x = (-1, 3), y = (-1, 1)))
(me, coords) = (1, (0, 1))
(me, neighbors) = (1, (x = (-1, 4), y = (0, 2)))
(me, coords) = (2, (0, 2))
(me, neighbors) = (2, (x = (-1, 5), y = (1, -1)))
(me, coords) = (3, (1, 0))
(me, neighbors) = (3, (x = (0, 6), y = (-1, 4)))
(me, coords) = (4, (1, 1))
(me, neighbors) = (4, (x = (1, 7), y = (3, 5)))
(me, coords) = (5, (1, 2))
(me, neighbors) = (5, (x = (2, 8), y = (4, -1)))
(me, coords) = (6, (2, 0))
(me, neighbors) = (6, (x = (3, -1), y = (-1, 7)))
(me, coords) = (7, (2, 1))
(me, neighbors) = (7, (x = (4, -1), y = (6, 8)))
(me, coords) = (8, (2, 2))
(me, neighbors) = (8, (x = (5, -1), y = (7, -1)))

gives this order

0 1 2
3 4 5
6 7 8

Hence, we should probably just relabel x <-> y in the neighbors named tuple and call it a day.

luraess · 2024-08-12T12:44:52Z

In a Cartesian view of the world (which I implicitly adopted in these examples), the x-dimsension refers to columns, while the y-dimensions refers to lines. In 2D, selecting a value of x actually means copying an entire vertical slice, while selecting a value of y means copying an entire horizontal slice.

Now, confusion comes from the mapping (x, y) from the Cartesian world, to the (x, y) in the array world, where the first entry (here x) refers to the horizontal slice (lines), and the second entry (here y) refers to the vertical slice (columns).

The Cartesian -> Array "mapping" may be confusing but is overall present. If you plot a heatmap of a 2D array, it will be actually transposed (as you expect Cartesian layout, but you may get array layout). Some routines correct for that, others not.

From my point of view, there is no issue nor confusion, but it's just a matter of facts that Cartesian and array (or linear algebra) layout are "transposed"; (columns, lines) vs (lines, columns).

luraess · 2024-08-12T12:46:22Z

The corner case (literally here); the corner cell will be correctly updated if and only if MPI communication is done sequentially amongst spatial dimensions. This will solve the corner case and correctly propagate the update there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about dimensions and update #3

Confusion about dimensions and update #3

carstenbauer commented Aug 9, 2024 •

edited

Loading

carstenbauer commented Aug 9, 2024 •

edited

Loading

carstenbauer commented Aug 9, 2024 •

edited

Loading

luraess commented Aug 12, 2024

luraess commented Aug 12, 2024

Confusion about dimensions and update #3

Confusion about dimensions and update #3

Comments

carstenbauer commented Aug 9, 2024 • edited Loading