Replies: 18 comments 5 replies
-
you can run both pest and pest++ in the cloud, but Im not sure about the "sized by cores based on the job demand" bit. Workers can come and go if that is what you mean... |
Beta Was this translation helpful? Give feedback.
-
Thanks J Dub, I think that is the answer I needed.
Best regards,
Stan McKenzie
IT/Cyber Engineer
Navarro, under contract with the
U.S. Department of Energy
Environmental Management Nevada Program
702-724-0897 - Work Telephone
From: J Dub ***@***.***>
Sent: Friday, August 16, 2024 8:34 AM
To: usgs/pestpp ***@***.***>
Cc: McKenzie, Stan (CONTR) ***@***.***>; Author ***@***.***>
Subject: Re: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
you can run both pest and pest++ in the cloud, but Im not sure about the "sized by cores based on the job demand" bit. Workers can come and go if that is what you mean...
—
Reply to this email directly, view it on GitHub<#306 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEAYDPOTFWKZC266PZYDBZTZRYLV3AVCNFSM6AAAAABMOT36UWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMZWGA2DMMQ>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
We currently have a physical cluster with work nodes accessed by the head node via IP address. I’m told that PEST is configured to use specific work nodes by IP address for that cluster. For our Azure cluster, a cluster is built when the job is submitted and IP addresses aren’t known until after the fact. In that case how would PEST be configured?
Thanks,
Stan McKenzie
IT/Cyber Engineer
Navarro, under contract with the
U.S. Department of Energy
Environmental Management Nevada Program
702-724-0897 - Work Telephone
From: J Dub ***@***.***>
Sent: Friday, August 16, 2024 8:34 AM
To: usgs/pestpp ***@***.***>
Cc: McKenzie, Stan (CONTR) ***@***.***>; Author ***@***.***>
Subject: Re: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
you can run both pest and pest++ in the cloud, but Im not sure about the "sized by cores based on the job demand" bit. Workers can come and go if that is what you mean...
—
Reply to this email directly, view it on GitHub<#306 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEAYDPOTFWKZC266PZYDBZTZRYLV3AVCNFSM6AAAAABMOT36UWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMZWGA2DMMQ>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
The master is a tcp/ip "listener" (or server) and the workers need to know either the unc name of the physical/virtual host of the master or the ipv4 address of the master, but the master doesnt need to know the details of the worker machines. So you'll need to know (or get) the ipv4 address of the master virtual machine once it spins up and then pass that the workers when they are instantiated. Ive seen this done programmatically where you can grep the master ip address and then pass it to the worker command line... @mnfienen might know better about all this tho... |
Beta Was this translation helpful? Give feedback.
-
It seems like it will work for us then as we do have a static IP assigned to the scheduler/master in Azure. So all that is needed is to assign that IP in PEST, correct?
From: mnfienen ***@***.***>
Sent: Friday, August 16, 2024 10:19 AM
To: usgs/pestpp ***@***.***>
Cc: McKenzie, Stan (CONTR) ***@***.***>; Author ***@***.***>
Subject: Re: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
We have been running PEST++ in AWS. At the moment, we have to know the ipv4 address of the master before launching workers. You'd really only be able to get around that if you pass it along programmatically, as @jtwhite79<https://github.com/jtwhite79> suggests. We felt like it was more straightforward to make sure the master IP is static at least for the duration of the cluster being active
—
Reply to this email directly, view it on GitHub<#306 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEAYDPO44ELCTPQUAHESGK3ZRYYA5AVCNFSM6AAAAABMOT36UWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMZWGE2DKMQ>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
I appreciate your insights, it will be very helpful.
From: mnfienen ***@***.***>
Sent: Friday, August 16, 2024 10:35 AM
To: usgs/pestpp ***@***.***>
Cc: McKenzie, Stan (CONTR) ***@***.***>; Author ***@***.***>
Subject: Re: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
yep that's correct! The workers can have IP addresses assigned at launch since they initiate the communication with the scheduler/master.
—
Reply to this email directly, view it on GitHub<#306 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEAYDPL2IDUWV64DHCAAR6TZRYZ2TAVCNFSM6AAAAABMOT36UWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMZWGE2TSNI>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
Good day -
I have built an on-demand Azure cluster based on AlmaLinux and have installed pest but am having difficulties getting compiles to run. I wonder if you can provide any insights related to the following output from the make run.
***@***.*** pest_source_mpi]# make -f pest.mak all
gfortran -c -O3 -static pestdata.for
gfortran -c -O3 -static pest.for
gfortran -c -O3 -static pestsub2.for
gfortran -c -O3 -static writall.for
gfortran -c -O3 -static pardef.for
gfortran -c -O3 -static readpest.for
gfortran -c -O3 -static runpest.for
gfortran -static -o pest \
pest.o pestsub1.o pestsub2.o dercalc.o modrun.o writall.o \
linpos.o lapack1.o writsig.o common.o \
pgetcl.o pestwait.o writint.o pardef.o\
drealrd.o space.o optwt.o cgsolve.o compress.o \
readpest.o runpest.o lsqr.o orthog.o ms_stubs.o pestdata.o
/bin/ld: cannot find -lgfortran
/bin/ld: cannot find -lm
/bin/ld: cannot find -lquadmath
/bin/ld: cannot find -lm
/bin/ld: cannot find -lc
collect2: error: ld returned 1 exit status
make: *** [pest.mak:307: pest] Error 1
Note that all the libraries that cannot be found are installed and in the case of "lgfortran" I created a link to the location it's looking for the library at "ln -s /usr/lib64/libgfortran.so.5.0.0 /usr/lib/libgfortran.so".
Again, your insights are appreciated.
Stan McKenzie
IT/Cyber Engineer
Navarro, under contract with the
U.S. Department of Energy
Environmental Management Nevada Program
702-724-0897 - Work Telephone
From: McKenzie, Stan (CONTR)
Sent: Friday, August 16, 2024 10:37 AM
To: usgs/pestpp ***@***.***>
Subject: RE: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
I appreciate your insights, it will be very helpful.
From: mnfienen ***@***.******@***.***>>
Sent: Friday, August 16, 2024 10:35 AM
To: usgs/pestpp ***@***.******@***.***>>
Cc: McKenzie, Stan (CONTR) ***@***.******@***.***>>; Author ***@***.******@***.***>>
Subject: Re: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
yep that's correct! The workers can have IP addresses assigned at launch since they initiate the communication with the scheduler/master.
-
Reply to this email directly, view it on GitHub<#306 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEAYDPL2IDUWV64DHCAAR6TZRYZ2TAVCNFSM6AAAAABMOT36UWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMZWGE2TSNI>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
Hi –
I am trying to run a pest compile, but it fails with what is shown below. I cannot get any answers from the PEST developers, can you please offer any insights? Note that I did create a slink to lgfortran and an “ld -lgfortran –verbose” finds it.
Thanks, and best regards for a great weekend!
***@***.*** pest_source_mpi]$ sudo make -f pest.mak all
gfortran -c -O3 -static pestdata.for
gfortran -c -O3 -static pest.for
gfortran -c -O3 -static pestsub2.for
gfortran -c -O3 -static writall.for
gfortran -c -O3 -static pardef.for
gfortran -c -O3 -static readpest.for
gfortran -c -O3 -static runpest.for
gfortran -static -o pest \
pest.o pestsub1.o pestsub2.o dercalc.o modrun.o writall.o \
linpos.o lapack1.o writsig.o common.o \
pgetcl.o pestwait.o writint.o pardef.o\
drealrd.o space.o optwt.o cgsolve.o compress.o \
readpest.o runpest.o lsqr.o orthog.o ms_stubs.o pestdata.o
/bin/ld: cannot find -lgfortran
/bin/ld: cannot find -lm
/bin/ld: cannot find -lquadmath
/bin/ld: cannot find -lm
/bin/ld: cannot find -lc
collect2: error: ld returned 1 exit status
make: *** [pest.mak:307: pest] Error 1
Stan McKenzie
IT/Cyber Engineer
Navarro, under contract with the
U.S. Department of Energy
Environmental Management Nevada Program
702-724-0897 - Work Telephone
From: McKenzie, Stan (CONTR)
Sent: Friday, August 16, 2024 10:37 AM
To: usgs/pestpp ***@***.***>
Subject: RE: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
I appreciate your insights, it will be very helpful.
From: mnfienen ***@***.******@***.***>>
Sent: Friday, August 16, 2024 10:35 AM
To: usgs/pestpp ***@***.******@***.***>>
Cc: McKenzie, Stan (CONTR) ***@***.******@***.***>>; Author ***@***.******@***.***>>
Subject: Re: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
yep that's correct! The workers can have IP addresses assigned at launch since they initiate the communication with the scheduler/master.
—
Reply to this email directly, view it on GitHub<#306 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEAYDPL2IDUWV64DHCAAR6TZRYZ2TAVCNFSM6AAAAABMOT36UWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMZWGE2TSNI>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
This looks like a compiler install/lib path thing - whats in the LD_LIBRARY_PATH env var? @mwtoews would be the best person to answer this question... |
Beta Was this translation helpful? Give feedback.
-
***@***.*** pest_source_mpi]$ echo $LD_LIBRARY_PATH
/opt/hpcx-v2.18-gcc-mlnx_ofed-redhat8-cuda12-x86_64/ompi/lib:/opt/hpcx-v2.18-gcc-mlnx_ofed-redhat8-cuda12-x86_64/nccl_rdma_sharp_plugin/lib:/opt/hpcx-v2.18-gcc
-mlnx_ofed-redhat8-cuda12-x86_64/sharp/lib:/opt/hpcx-v2.18-gcc-mlnx_ofed-redhat8-cuda12-x86_64/hcoll/lib:/opt/hpcx-v2.18-gcc-mlnx_ofed-redhat8-cuda12-x86_64/uc
c/lib/ucc:/opt/hpcx-v2.18-gcc-mlnx_ofed-redhat8-cuda12-x86_64/ucc/lib:/opt/hpcx-v2.18-gcc-mlnx_ofed-redhat8-cuda12-x86_64/ucx/lib/ucx:/opt/hpcx-v2.18-gcc-mlnx_
ofed-redhat8-cuda12-x86_64/ucx/lib
It doesn’t look like the location of the lgfortran directory is listed there, does it need to be?
Thanks for the insights!
From: J Dub ***@***.***>
Sent: Friday, November 8, 2024 9:18 AM
To: usgs/pestpp ***@***.***>
Cc: McKenzie, Stan (CONTR) ***@***.***>; Author ***@***.***>
Subject: Re: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
You don't often get email from ***@***.******@***.***>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
This looks like a compiler install/lib path thing - whats in the LD_LIBRARY_PATH env var? @mwtoews<https://github.com/mwtoews> would be the best person to answer this question...
—
Reply to this email directly, view it on GitHub<#306 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEAYDPJF4YI23ENCQDYR6LLZ7TW5XAVCNFSM6AAAAABMOT36UWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMJZGE2TMNI>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
There are several missing: gfortran, m, quadmath (which is always a fun one to find!) and c |
Beta Was this translation helpful? Give feedback.
-
Thanks J Dub!
Can I add those directories (when I locate them) to the LD_LIBRARY_PATH like I would the $PATH environment?
From: J Dub ***@***.***>
Sent: Friday, November 8, 2024 9:57 AM
To: usgs/pestpp ***@***.***>
Cc: McKenzie, Stan (CONTR) ***@***.***>; Author ***@***.***>
Subject: Re: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
There are several missing: gfortran, m, quadmath (which is always a fun one to find!) and c
—
Reply to this email directly, view it on GitHub<#306 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEAYDPOS5HV6LJHUKTNMPXDZ7T3QFAVCNFSM6AAAAABMOT36UWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMJZGE4TGNQ>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
I think thats the idea, but Im no expert here. When this has happened to me before, I just blast away the compiler and reinstall haha |
Beta Was this translation helpful? Give feedback.
-
Hi all, I've had a chance to get to the bottom of this using
the last two add the necessary static libraries to the system. Variables like LD_LIBRARY_PATH are only for shared libraries, so don't have any influence on the compilation of static programs. |
Beta Was this translation helpful? Give feedback.
-
Hi Mike –
Thanks for the insights, they been very helpful. We do have an issue here with the AlmaLinux version as we are using 8.7 as that is what Microsoft said worked with their Azure CycleCloud cluster and I do see some differences in libraries/commands between the versions. Those highlighted below would not work in 8.7 and I haven’t found a workaround, I’m tempted to upgrade to 9.4 but I think it may affect the CycleCloud functionality like spinning up on-demand worknodes & etc. So, before I upgrade I wanted to know if 9.4 was the only version you used?
Again, I am very grateful for your insightful knowledge.
…--Stan McKenzie
From: Mike Taves ***@***.***>
Sent: Sunday, November 10, 2024 1:13 PM
To: usgs/pestpp ***@***.***>
Cc: McKenzie, Stan (CONTR) ***@***.***>; Author ***@***.***>
Subject: Re: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
Hi all, I've had a chance to get to the bottom of this using almalinux:9.4 docker image. I've replicated your errors, and it seems that AlmaLinux needs a few extra "tricks" to enable static repos (e.g. with libgfortran.a), as PEST was developed to be mostly static. (Side point, requiring static everywhere is overboard; shared libraries are sufficient). I've found to install the necessary tools and packages for classic PEST, you'll need these commands:
dnf install -y 'dnf-command(config-manager)'
dnf config-manager --set-enabled crb
dnf install -y almalinux-release-devel
dnf install -y gfortran make
dnf install -y libgfortran-static glibc-static
the last two add the necessary static libraries to the system.
Variables like LD_LIBRARY_PATH are only for shared libraries, so done have any influence on the compilation.
—
Reply to this email directly, view it on GitHub<#306 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEAYDPMWKONU3Y3777WE4F3Z77D37AVCNFSM6AAAAABMOT36UWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMRQGY3DCMI>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
Hi Mike, I am very grateful for the info! All commands up to the fifth line/command worked but with that one I get the following error returned, is that because a repo is missing or do you have other insights?
Thanks,
…--Stan McKenzie
***@***.*** ~]$ sudo dnf install -y almalinux-release-devel
AlmaLinux 8 - PowerTools
No match for argument: almalinux-release-devel
Error: Unable to find a match: almalinux-release-devel
From: Mike Taves ***@***.***>
Sent: Tuesday, November 12, 2024 1:55 PM
To: usgs/pestpp ***@***.***>
Cc: McKenzie, Stan (CONTR) ***@***.***>; Author ***@***.***>
Subject: Re: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
Yes, I was looking at a 9.4 docker image. But looking at a 8.7 docker image, the recipe is slightly different:
rpm --import https://repo.almalinux.org/almalinux/RPM-GPG-KEY-AlmaLinux
dnf upgrade -y almalinux-release
dnf install -y 'dnf-command(config-manager)'
dnf config-manager --set-enabled powertools
dnf install -y almalinux-release-devel
dnf install -y gcc-gfortran make
dnf install -y libgfortran-static glibc-static
—
Reply to this email directly, view it on GitHub<#306 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEAYDPNPPB5RQOX75IJRWSL2AJ2JPAVCNFSM6AAAAABMOT36UWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMRTGI4DENY>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
Good day Mike –
I’ve been trying to configure our Azure cluster without success as I cannot find/install “almalinux-release-devel”. So, I am considering changing out the OS to a different Linux variant like Ubuntu but wonder whether PEST will run on it, can you offer any insights?
Thanks,
Stan McKenzie
IT/Cyber Engineer
Navarro, under contract with the
U.S. Department of Energy
Environmental Management Nevada Program
702-724-0897 - Work Telephone
From: Mike Taves ***@***.***>
Sent: Tuesday, November 12, 2024 1:55 PM
To: usgs/pestpp ***@***.***>
Cc: McKenzie, Stan (CONTR) ***@***.***>; Author ***@***.***>
Subject: Re: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
Yes, I was looking at a 9.4 docker image. But looking at a 8.7 docker image, the recipe is slightly different:
rpm --import https://repo.almalinux.org/almalinux/RPM-GPG-KEY-AlmaLinux
dnf upgrade -y almalinux-release
dnf install -y 'dnf-command(config-manager)'
dnf config-manager --set-enabled powertools
dnf install -y almalinux-release-devel
dnf install -y gcc-gfortran make
dnf install -y libgfortran-static glibc-static
—
Reply to this email directly, view it on GitHub<#306 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEAYDPNPPB5RQOX75IJRWSL2AJ2JPAVCNFSM6AAAAABMOT36UWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMRTGI4DENY>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
Hi Mike, et al –
I changed our Azure cluster to Ubuntu 22.04 LTS and was able to get PEST Jobs to compile but when the compiled jobs are run they fail and it seems to be due to this reported error:
“Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.”
Neither me or my modeler/scientist know for sure how to fix this so I was wondering if you or any reader of this can offer any insights.
Thanks, and best regards,
Stan McKenzie
IT/Cyber Engineer
Navarro, under contract with the
U.S. Department of Energy
Environmental Management Nevada Program
702-724-0897 - Work Telephone
From: Mike Taves ***@***.***>
Sent: Wednesday, December 4, 2024 1:14 PM
To: usgs/pestpp ***@***.***>
Cc: McKenzie, Stan (CONTR) ***@***.***>; Author ***@***.***>
Subject: Re: [usgs/pestpp] Using PEST with CycleCloud (Azure) Cluster (Discussion #306)
There are likely more Linux users on Ubuntu (such as myself), so I'd regard this a good move. Ubuntu's default repos have all the necessary tools to compile PEST.
—
Reply to this email directly, view it on GitHub<#306 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BEAYDPMPH7GSPP5B3URYQQL2D5WALAVCNFSM6AAAAABMOT36UWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNBWGYYTCMA>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
|
Beta Was this translation helpful? Give feedback.
-
Is it possible to run PEST on a virtual cluster that is sized by cores based on the job demand?
Beta Was this translation helpful? Give feedback.
All reactions