You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Machine architecture and number of physical cores: 8th/9th Gen Core 8-core Desktop Processor Host Bridge/DRAM Registers [Coffee Lake S] (16 threads)
cmake version 3.21.1
Note, in running mpicc -show, I get /opt/nvidia/hpc_sdk/Linux_x86_64/21.1/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/mpicc: error while loading shared libraries: libnvcpumath.so: cannot open shared object file: No such file or directory
The issue
What I was trying to do
I was trying to run four concurrent images, executing the compilation of my code, found at https://github.com/Oiubrab/byinheritance, executing sudo chmod u+x i_am_in_command.zsh && ./i_am_in_command.zsh clean 2 test print. The why is described in the github readme, found in the link, but basically I have created a neural network that computes a trading action in fortran. Ultimately, the pertinent execution lies in the aforementioned bash script line cafrun -n 4 --use-hwthread-cpus ./lack_of_comprehension $3.
What Happened
When this line is run, there is an mpi error generated and, having put in two print statements to catch the error, where place represents the order of the placing of the statements, I get:
Invalid Trades:
[1, 0, 0, 0, 0]
0
Network Choice:
[0, 0, 0, 0, 0, 0, 0] [0, 0, 0, 0, 0, 0, 0] [0, 0, 0, 0, 0, 0, 0]
[0, 0, 0]
Market Prices and Info:
{'stock_identifier': 'SE1', 'stock_number': 1, 'stock_price': 0.33, 'units_owned': 0}
{'stock_identifier': 'ADV', 'stock_number': 2, 'stock_price': 0.001, 'units_owned': 0}
{'stock_identifier': 'SBR', 'stock_number': 3, 'stock_price': 0.115, 'units_owned': 0}
Account Position:
{'account': 'test', 'account_value': 3000.0, 'time': 1628241351.6863492}
run: 1
image number: 1 Place: 1
image number: 2 Place: 1
image number: 3 Place: 1
image number: 4 Place: 1
image number: 2 Place: 2
image number: 3 Place: 2
image number: 1 Place: 2
[manjaro:25102] *** An error occurred in MPI_Win_detach
[manjaro:25102] *** reported by process [3482124289,0]
[manjaro:25102] *** on win rdma window 5
[manjaro:25102] *** MPI_ERR_UNKNOWN: unknown error
[manjaro:25102] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[manjaro:25102] *** and potentially your MPI job)
[manjaro:25098] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[manjaro:25098] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Error: Command:
`/usr/bin/mpiexec -n 4 --use-hwthread-cpus ./lack_of_comprehension test`
failed to run.
Invalid Trades:
[0, 0, 1, 0, 0]
2
Network Choice:
[0, 0, 0, 0, 0, 0, 0] [1, 0, 0, 0, 0, 0, 0] [0, 1, 0, 1, 0, 0, 0]
[0, -1, -10]
Market Prices and Info:
{'stock_identifier': 'SE1', 'stock_number': 1, 'stock_price': 0.33, 'units_owned': 0}
{'stock_identifier': 'ADV', 'stock_number': 2, 'stock_price': 0.001, 'units_owned': 0}
{'stock_identifier': 'SBR', 'stock_number': 3, 'stock_price': 0.115, 'units_owned': 0}
Account Position:
{'account': 'test', 'account_value': 3000.0, 'time': 1628241360.06828}
run: 2
image number: 1 Place: 1
image number: 2 Place: 1
image number: 3 Place: 1
image number: 4 Place: 1
image number: 1 Place: 2
image number: 2 Place: 2
image number: 3 Place: 2
[manjaro:25174] *** An error occurred in MPI_Win_detach
[manjaro:25174] *** reported by process [3486973953,2]
[manjaro:25174] *** on win rdma window 5
[manjaro:25174] *** MPI_ERR_UNKNOWN: unknown error
[manjaro:25174] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[manjaro:25174] *** and potentially your MPI job)
[manjaro:25168] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[manjaro:25168] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Error: Command:
`/usr/bin/mpiexec -n 4 --use-hwthread-cpus ./lack_of_comprehension test`
failed to run.
Invalid Trades:
[0, 0, 1, 0, 0]
2
Network Choice:
[1, 0, 0, 0, 0, 0, 0] [0, 0, 0, 0, 0, 0, 0] [0, 1, 0, 0, 0, 0, 0]
[-1, 0, -2]
Market Prices and Info:
{'stock_identifier': 'SE1', 'stock_number': 1, 'stock_price': 0.39, 'units_owned': 0}
{'stock_identifier': 'ADV', 'stock_number': 2, 'stock_price': 0.001, 'units_owned': 0}
{'stock_identifier': 'SBR', 'stock_number': 3, 'stock_price': 0.12, 'units_owned': 0}
Account Position:
{'account': 'test', 'account_value': 3000.0, 'time': 1628241366.8761365}
What I expected to happen
Markets and network choices vary. This is expected. What is not expected is the error and the fact that the fourth image does not run to the second place. I should see the output above, but without the error, and with a image number: 4 Place: 2 line. This exact code (minus the print statements) ran without a hitch with the last version of openmpi (openmpi-4.0.5-3-x86_64). I have since tried to run other opencoarrays programs I have written and found various errors trying to run less than six threads.
Step by step reproduction
This error can be reproduced following the execution above. As this error seems to be code agnostic, you can also try running the process below to reproduce a similar error (again, this code was running previous to the update):
step 1
Take the following code and save as an f95 file (e.g test_arraymove.f95):
program test_arraycom
real,dimension(10) , codimension [*] :: x , y
integer:: num_img , me
num_img = num_images()
me = this_image ()
print*,me,num_img
! Some code here
x (2) = x ( 3 ) [ 6 ]! get value from image 6
x ( 6 ) [ 4 ] = x (1)! put value on image 4
x ( : ) [ 2 ] = y ( : )! put array on image 2
sync all
! Remote−to−remote array transfer
if(me == 1)then
y(:)[num_img]=x(:)[ 4 ]
sync images (num_img)
elseif(me == num_img) then
sync images ([ 1 ])
end if
x(1:10:2)=y(1:10:2)[4]! strided get from 4end program
step 2
compile the code with caf test_arraymove.f95 -o programname
step 3
run the code, with the number of CPU threads, $2, below 6
1 4
2 4
3 4
4 4
[manjaro:28814] *** An error occurred in MPI_Win_lock
[manjaro:28814] *** reported by process [3708682241,1]
[manjaro:28814] *** on win rdma window 6
[manjaro:28814] *** MPI_ERR_RANK: invalid rank
[manjaro:28814] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[manjaro:28814] *** and potentially your MPI job)
[manjaro:28809] 3 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[manjaro:28809] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Error: Command:
`/usr/bin/mpiexec -n 4 --use-hwthread-cpus ./testarraymove`
failed to run.
The text was updated successfully, but these errors were encountered:
System Information:
uname -a
:Linux manjaro 5.13.5-1-MANJARO tests dis_transpose: test passed #1 SMP PREEMPT Mon Jul 26 07:43:29 UTC 2021 x86_64 GNU/LinuxNote, in running
mpicc -show
, I get/opt/nvidia/hpc_sdk/Linux_x86_64/21.1/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/mpicc: error while loading shared libraries: libnvcpumath.so: cannot open shared object file: No such file or directory
The issue
What I was trying to do
I was trying to run four concurrent images, executing the compilation of my code, found at https://github.com/Oiubrab/byinheritance, executing
sudo chmod u+x i_am_in_command.zsh && ./i_am_in_command.zsh clean 2 test print
. The why is described in the github readme, found in the link, but basically I have created a neural network that computes a trading action in fortran. Ultimately, the pertinent execution lies in the aforementioned bash script linecafrun -n 4 --use-hwthread-cpus ./lack_of_comprehension $3
.What Happened
When this line is run, there is an mpi error generated and, having put in two print statements to catch the error, where place represents the order of the placing of the statements, I get:
What I expected to happen
Markets and network choices vary. This is expected. What is not expected is the error and the fact that the fourth image does not run to the second place. I should see the output above, but without the error, and with a
image number: 4 Place: 2
line. This exact code (minus the print statements) ran without a hitch with the last version of openmpi (openmpi-4.0.5-3-x86_64). I have since tried to run other opencoarrays programs I have written and found various errors trying to run less than six threads.Step by step reproduction
This error can be reproduced following the execution above. As this error seems to be code agnostic, you can also try running the process below to reproduce a similar error (again, this code was running previous to the update):
step 1
Take the following code and save as an f95 file (e.g test_arraymove.f95):
step 2
compile the code with
caf test_arraymove.f95 -o programname
step 3
run the code, with the number of CPU threads, $2, below 6
i.e
cafrun -n $2 --use-hwthread-cpus ./programname
step 4
get an error of the form:
The text was updated successfully, but these errors were encountered: