Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move dynamic stack allocations outside do concurrent #227

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions src/fiats/neural_network_s.F90
Original file line number Diff line number Diff line change
Expand Up @@ -926,6 +926,9 @@
block
real reduce_dcdb(size(dcdb,1),size(dcdb,2),mini_batch_size)
real reduce_dcdw(size(dcdw,1),size(dcdw,2),size(dcdw,3),mini_batch_size)
real a(maxval(self%nodes_), input_layer:output_layer) ! Activations
real z(size(b,1),size(b,2)), delta(size(b,1),size(b,2))

reduce_dcdb = 0.
reduce_dcdw = 0.

Expand All @@ -934,9 +937,6 @@

iteration: &
block

real a(maxval(self%nodes_), input_layer:output_layer) ! Activations
real z(size(b,1),size(b,2)), delta(size(b,1),size(b,2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned the proposed transformation might lead to incorrect behavior.

All three of these variables need to be semantically "local" to each concurrent iteration, because each concurrent iteration independently references and defines (reads and writes) these temporary variables as part of their execution.

This version of the code is the one for compilers lacking the F18 local locality specifier we'd prefer to use, and we worked-around that missing feature by instead deliberately declaring/allocating these local variables inside the body of the do concurrent iteration. By declaring them within the scope of the DO CONCURRENT iteration block, this unambiguously ensures the storage for these three variables is private to each iteration (the semantic this code requires for correctness).

F23 11.1.7.5: (emphasis added)

A construct or statement entity of a construct or statement within the DO CONCURRENT construct has SHARED locality if it has the SAVE attribute. If it does not have the SAVE attribute, is a different entity in each iteration, similar to LOCAL locality.

By moving the declaration outside do concurrent as proposed here, these variables effectively gain "unspecified locality", and instead become subject to the following semantic rules in F23 11.1.7.5 (inherited from F08 8.1.6.7):

If a variable has unspecified locality,
• if it is referenced in an iteration it shall either be previously defined during that iteration, or shall not be defined or become undefined during any other iteration; if it is defined or becomes undefined by more than one iteration it becomes undefined when the loop terminates;
[...]
NOTE 3
The restrictions on the statements in a DO CONCURRENT construct are designed to ensure there are no data dependencies between iterations of the loop.

This wording is admittedly less definitive, but I read this to imply that these variables still need to behave as if each concurrent iteration has its own private copy, nearly identical to the LOCAL specifier (ignoring differences in variable contents at loop entry/exit that are irrelevant in this code).

TL;DR: I don't understand why this transformation "helps" the compiler in question. In particular, if this transformation causes the compiler to emit code where concurrent iterations all share the same copy of these three variables, that would break the correctness of this code. This procedure semantically requires each concurrent iteration to have a private/local copy of these three variables.

#endif

a(1:self%num_inputs(), input_layer) = inputs(pair)%values()
Expand Down
Loading