-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AWS functionality to workflow setup and forecast job #1667
Add AWS functionality to workflow setup and forecast job #1667
Conversation
…nryWinterbottom-NOAA/global-workflow into feature/cloud_aws_fcstonly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shellcheck found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.
Updates for UFSWM develop branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for getting late on this PR.
I am noticing redefinitions between the global-workflow and the ufs-weather-model e.g. PW_CSP and some of the module loading.
Please take a look at the comments.
ush/detect_machine.sh
Outdated
@@ -38,6 +38,16 @@ case $(hostname -f) in | |||
*) MACHINE_ID=UNKNOWN ;; # Unknown platform | |||
esac | |||
|
|||
if [[ ${MACHINE_ID} == "UNKNOWN" ]]; then | |||
case $(dnsdomainname -f) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case $(dnsdomainname -f) in | |
case ${PW_CSP:-} in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in latest commit/update.
ush/detect_machine.sh
Outdated
case $(dnsdomainname -f) in | ||
|
||
# AWS Parallel Works | ||
*pw-noaa*pw.local) MACHINE_ID=aws ;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*pw-noaa*pw.local) MACHINE_ID=aws ;; | |
"aws" | "google" | "azure" ) MACHINE_ID=noaacloud ;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in latest commit/update.
@@ -15,14 +15,15 @@ class Host: | |||
""" | |||
|
|||
SUPPORTED_HOSTS = ['HERA', 'ORION', 'JET', | |||
'WCOSS2', 'S4', 'CONTAINER'] | |||
'WCOSS2', 'S4', 'CONTAINER', 'AWSPW'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'WCOSS2', 'S4', 'CONTAINER', 'AWSPW'] | |
'WCOSS2', 'S4', 'CONTAINER', 'NOAACLOUD'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be there is a need to distinguish between Azure, Google and AWS later.
But for now, lets keep with the ufs-weather-model
sorc/build_ufs.sh
Outdated
if [[ "${MACHINE_ID}" == "noaacloud" ]]; then | ||
case $(dnsdomainname -f) in | ||
# TODO: Add Google and Azure platforms. | ||
*pw-noaa*pw.local) CLOUD_MACHINE_ID="aws.${RT_COMPILER}" ;; | ||
*) ;; | ||
esac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't be doing detection again. MACHINE ID has been detected once. UFSWM is using a PW_CSP
variable. Lets use that and if necessary use PW_CSP
to distinguish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aerorahul Can you point me to a use case within the UFSWM of the PW_CSP
instance?
For clarification, the above detection was needed to determine which CSP the user is on. The DNS strings should be unique. This allows the GW (and build system) to determine which host they are on and thus make the downstream decisions accordingly.
For example, AWS PW use mpiexec
as it's parallel executable launcher. However, Azure or Google could use another. This may ultimately wind up being redundant but since I haven't see the PW_CSP
use case I felt it unsafe to make such an assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/ufs-community/ufs-weather-model/blob/develop/tests/detect_machine.sh
I would recommend a PW_CSP
as a separate configurable if you wish to identify a CSP available on NOAACloud.
sorc/build_ufs.sh
Outdated
if [[ "${CLOUD_MACHINE_ID}" == "aws.intel" ]]; then | ||
module use /contrib/spack-stack/envs/ufswm/install/modulefiles/Core | ||
module load stack-intel | ||
module load stack-intel-oneapi-mpi | ||
module load ufs-weather-model-env/1.0.0 | ||
# TODO: It is still uncertain why this is the only module that is | ||
# missing; check the spack build as this needed to be added manually. | ||
module load w3emc/2.9.2 # TODO: This has similar issues for the EPIC stack. | ||
module list | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary as ufs_noaacloud.intel.lua
contains the information. Can we not use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path that the module file points to, an EPIC stack, does not exist. As a result, the build environment cannot be defined using the module that you noted above.
[Henry.Winterbottom@hwufscpldcld-2 modulefiles]$ ls /contrib/EPIC/spack-stack/spack-stack-1.3.0/
ls: cannot access /contrib/EPIC/spack-stack/spack-stack-1.3.0/: No such file or directory
Note that this comes from the develop
branch.
I suggest that we include this bit for now. Once EPIC provides a working stack we can create a new PR and make the update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I did not know that the EPIC stack is not available for use by us. This certainly makes porting an issue.
I will discuss with @arunchawla-NOAA on how to enable that.
@SadeghTabas-NOAA Would you please review the changes in the build and run script requirements of the ufs-weather-model? |
@aerorahul Sure, I'll do it today. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only went through the shell scripts here.
Take a look if you agree with these suggestions.
Co-authored-by: Rahul Mahajan <[email protected]>
Co-authored-by: Rahul Mahajan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No additional requests beyond what Rahul has pointed out. I would have rather kept $MACHINE
as the specific cloud platform, but understand the decision to follow what UFS has done.
Co-authored-by: Rahul Mahajan <[email protected]>
Co-authored-by: Rahul Mahajan <[email protected]>
Co-authored-by: Rahul Mahajan <[email protected]>
Co-authored-by: Rahul Mahajan <[email protected]>
Co-authored-by: Rahul Mahajan <[email protected]>
Co-authored-by: Rahul Mahajan <[email protected]>
Co-authored-by: Rahul Mahajan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good.
You will have to show us how to run this beast on the cloud.
…)" This reverts commit 38cd0bc.
Description
This PR provides support for running UFS weather model forecasts, using the global-workflow, on the AWS Parallel Works (PW) platform. This PR accomplishes the following:
Follow-up PRs will include the following:
TODO
issues within this PR; several notes are made throughout as to how to better streamline the connections between the UFSWM and the global-workflow for the respective cloud platforms;No dependencies other than the loading of the appropriate AWS PW modules are required. These are included in this PR.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
This PR has been tested as follows:
Checklist