Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support gefs C48 on Azure #2881

Merged
merged 84 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from 82 commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
103f2c4
compiled OK now
weihuang-jedi Jun 18, 2024
916ff6c
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jun 19, 2024
b0ac406
re-test on aws with fewer changes
weihuang-jedi Jun 19, 2024
3de972f
make change in tasks.py to avoid error finding libiomp5.so problem
weihuang-jedi Jun 21, 2024
8308375
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jun 21, 2024
bc4c4a8
add comments so the reviewers know that these changes are for AWS, an…
weihuang-jedi Jun 22, 2024
924aede
Merge branch 'aws-forecast-only' of ssh://github.com/NOAA-EPIC/global…
weihuang-jedi Jun 22, 2024
b724937
add comments so the reviewers know that these changes are for AWS, an…
weihuang-jedi Jun 22, 2024
12ab29f
reverse config.resource changes, and memory restriction on AWS
weihuang-jedi Jun 25, 2024
adff250
sync with emc repo
weihuang-jedi Jun 25, 2024
2290ea2
move common data to a shared place
weihuang-jedi Jun 26, 2024
cd2c8e7
use ICs from s3-bucket
weihuang-jedi Jun 26, 2024
4e144e5
Merge branch 'develop' into aws-forecast-only
weihuang-jedi Jun 26, 2024
46e3ef5
change as suggested by reviewer
weihuang-jedi Jul 2, 2024
32f13eb
sync with develop
weihuang-jedi Jul 2, 2024
a34a4c8
sync sorc/ufs_model.fd
weihuang-jedi Jul 4, 2024
44011a3
remove mpmd_opt from APRUN_UFS
weihuang-jedi Jul 4, 2024
965ec80
mpmd_opt and switch off tracker/genesis default for AWS
weihuang-jedi Jul 5, 2024
3ce268e
add TODO
weihuang-jedi Jul 5, 2024
f03ac78
remove ncl version on AWS
weihuang-jedi Jul 6, 2024
007a56b
Merge remote-tracking branch 'origin/develop' into aws-forecast-only
weihuang-jedi Jul 6, 2024
2f6ec6e
sync ufs_model
weihuang-jedi Jul 6, 2024
dba83a7
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 10, 2024
24fe804
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 12, 2024
e8a2e0f
sync and remove gempak from noaacloud
weihuang-jedi Jul 12, 2024
4013eb1
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 15, 2024
a548c7f
update modules hash
weihuang-jedi Jul 15, 2024
d37e646
update module hash
weihuang-jedi Jul 15, 2024
2a80162
use bucket
weihuang-jedi Jul 17, 2024
fa44862
remove /scratch1, but kept TODO
weihuang-jedi Jul 17, 2024
55c7e7e
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 17, 2024
07851dc
re-sync
weihuang-jedi Jul 19, 2024
492808d
sync
weihuang-jedi Jul 19, 2024
d7a262e
add is_exclusive to resource.AWSPW
weihuang-jedi Jul 23, 2024
af573af
sync hash with EMC repo
weihuang-jedi Jul 23, 2024
0929180
remove --export=ALL from native, when is_exclusive set true
weihuang-jedi Jul 23, 2024
06fecca
sync
weihuang-jedi Jul 23, 2024
d8783ab
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 25, 2024
d22bc6d
Merge remote-tracking branch 'origin/develop' into aws-forecast-only
weihuang-jedi Jul 25, 2024
a5c441f
Merge branch 'aws-forecast-only' of ssh://github.com/NOAA-EPIC/global…
weihuang-jedi Jul 25, 2024
77e8233
Make AWS works similar to on-prem machine
weihuang-jedi Jul 25, 2024
96f73ba
remove --export=ALL from 'native'
weihuang-jedi Jul 25, 2024
a33a3be
remove --export=ALL from 'native'
weihuang-jedi Jul 25, 2024
80b294b
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 25, 2024
01a8928
add py-f90nml to noaacloud modulefile
weihuang-jedi Jul 25, 2024
b035947
remove un-necessary added lines
weihuang-jedi Jul 25, 2024
bf3b460
remove un-necessary added lines
weihuang-jedi Jul 25, 2024
47627ff
remove added lines which was originally for AWS, but should be define…
weihuang-jedi Jul 26, 2024
7bf8900
restore as develop
weihuang-jedi Jul 26, 2024
0685a8f
try to fix pynorms error
weihuang-jedi Jul 29, 2024
381403d
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 29, 2024
0e71f7d
Merge branch 'aws-forecast-only' of ssh://github.com/NOAA-EPIC/global…
weihuang-jedi Jul 29, 2024
2024835
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Jul 30, 2024
2c52016
sync with EMC repo
weihuang-jedi Jul 30, 2024
cd6c541
sync Gaea link with EMC repo, and only include blocks/packs that run …
weihuang-jedi Jul 30, 2024
1f60ed0
Merge branch 'aws-forecast-only' of github.com:NOAA-EPIC/global-workf…
weihuang-jedi Jul 30, 2024
e1a57b4
merge fro develop
weihuang-jedi Jul 30, 2024
fe9a457
Remove ACCOUNT_SERVICE
weihuang-jedi Jul 31, 2024
5c6e052
Merge branch 'NOAA-EMC:develop' into aws-forecast-only
weihuang-jedi Aug 1, 2024
6a4bada
run gefs
weihuang-jedi Aug 1, 2024
e802ba0
Merge branch 'NOAA-EMC:develop' into aws-gefs-2
weihuang-jedi Aug 2, 2024
31e3b2b
Merge branch 'NOAA-EMC:develop' into aws-gefs-2
weihuang-jedi Aug 8, 2024
868b671
Merge remote-tracking branch 'origin/develop' into aws-gefs-2
weihuang-jedi Aug 13, 2024
de55cc3
sync after AWS code merged
weihuang-jedi Aug 13, 2024
f5a03d4
make gefs run on AWS
weihuang-jedi Aug 13, 2024
e606fc7
test in gefs C48 case
weihuang-jedi Aug 13, 2024
1aff5c3
Merge branch 'develop' into aws-gefs-2
weihuang-jedi Aug 13, 2024
8b3ad61
fix pynorms error
weihuang-jedi Aug 13, 2024
1c7e45a
Merge branch 'aws-gefs-2' of github.com:NOAA-EPIC/global-workflow-clo…
weihuang-jedi Aug 13, 2024
73050a8
fix pynorms error
weihuang-jedi Aug 13, 2024
10d4104
use 'unset memory' in resource.AWSPW, instead of change it in rocoto.py
weihuang-jedi Aug 14, 2024
b3de7c1
revert hash
weihuang-jedi Aug 14, 2024
16770f8
remove export APRUN="${APRUN}"
weihuang-jedi Aug 14, 2024
b9416fa
revert sed separator change for now
weihuang-jedi Aug 15, 2024
456fa0c
revert sed separator change for now, more revert needed
weihuang-jedi Aug 15, 2024
a8576c2
Merge branch 'NOAA-EMC:develop' into aws-gefs-2
weihuang-jedi Aug 20, 2024
35ae83e
turn off wave on AWS
weihuang-jedi Aug 21, 2024
f1ac84b
start gefs on Azure
weihuang-jedi Aug 21, 2024
ef91ac3
Merge branch 'develop' of github.com:NOAA-EPIC/global-workflow-cloud …
weihuang-jedi Aug 26, 2024
17e6739
sync with develop
weihuang-jedi Aug 28, 2024
2266557
Merge branch 'develop' of github.com:NOAA-EPIC/global-workflow-cloud …
weihuang-jedi Aug 30, 2024
e81dc1b
sync with develop
weihuang-jedi Sep 10, 2024
761a168
correct a shell error
weihuang-jedi Sep 10, 2024
d159595
sync with develop
weihuang-jedi Sep 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 49 additions & 1 deletion env/AZUREPW.env
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ if [[ -n "${ntasks:-}" && -n "${max_tasks_per_node:-}" && -n "${tasks_per_node:-
NTHREADS1=${threads_per_task:-1}
[[ ${NTHREADSmax} -gt ${max_threads_per_task} ]] && NTHREADSmax=${max_threads_per_task}
[[ ${NTHREADS1} -gt ${max_threads_per_task} ]] && NTHREADS1=${max_threads_per_task}
APRUN="${launcher} -n ${ntasks}"
export APRUN="${launcher} -n ${ntasks}"
weihuang-jedi marked this conversation as resolved.
Show resolved Hide resolved
else
echo "ERROR config.resources must be sourced before sourcing AZUREPW.env"
exit 2
Expand All @@ -43,6 +43,13 @@ if [[ "${step}" = "fcst" ]] || [[ "${step}" = "efcs" ]]; then
export APRUN_UFS="${launcher} -n ${ufs_ntasks}"
unset nnodes ufs_ntasks

elif [[ "${step}" = "waveinit" ]] || [[ "${step}" = "waveprep" ]] || [[ "${step}" = "wavepostsbs" ]] || [[ "${step}" = "wavepostbndpnt" ]] || [[ "${step}" = "wavepostbndpntbll" ]] || [[ "${step}" = "wavepostpnt" ]]; then

export CFP_MP="YES"
if [[ "${step}" = "waveprep" ]]; then export MP_PULSE=0 ; fi
export wavempexec=${launcher}
export wave_mpmd=${mpmd_opt}

elif [[ "${step}" = "post" ]]; then

export NTHREADS_NP=${NTHREADS1}
Expand All @@ -52,4 +59,45 @@ elif [[ "${step}" = "post" ]]; then
[[ ${NTHREADS_DWN} -gt ${max_threads_per_task} ]] && export NTHREADS_DWN=${max_threads_per_task}
export APRUN_DWN="${launcher} -n ${ntasks_dwn}"

elif [[ "${step}" = "atmos_products" ]]; then

export USE_CFP="YES" # Use MPMD for downstream product generation on Hera

elif [[ "${step}" = "oceanice_products" ]]; then

export NTHREADS_OCNICEPOST=${NTHREADS1}
export APRUN_OCNICEPOST="${launcher} -n 1 --cpus-per-task=${NTHREADS_OCNICEPOST}"

elif [[ "${step}" = "ecen" ]]; then

export NTHREADS_ECEN=${NTHREADSmax}
export APRUN_ECEN="${APRUN}"

export NTHREADS_CHGRES=${threads_per_task_chgres:-12}
[[ ${NTHREADS_CHGRES} -gt ${max_tasks_per_node} ]] && export NTHREADS_CHGRES=${max_tasks_per_node}
export APRUN_CHGRES="time"

export NTHREADS_CALCINC=${threads_per_task_calcinc:-1}
[[ ${NTHREADS_CALCINC} -gt ${max_threads_per_task} ]] && export NTHREADS_CALCINC=${max_threads_per_task}
export APRUN_CALCINC="${APRUN}"

elif [[ "${step}" = "esfc" ]]; then

export NTHREADS_ESFC=${NTHREADSmax}
export APRUN_ESFC="${APRUN}"

export NTHREADS_CYCLE=${threads_per_task_cycle:-14}
[[ ${NTHREADS_CYCLE} -gt ${max_tasks_per_node} ]] && export NTHREADS_CYCLE=${max_tasks_per_node}
export APRUN_CYCLE="${APRUN}"

elif [[ "${step}" = "epos" ]]; then

export NTHREADS_EPOS=${NTHREADSmax}
export APRUN_EPOS="${APRUN}"

elif [[ "${step}" = "fit2obs" ]]; then

export NTHREADS_FIT2OBS=${NTHREADS1}
export MPIRUN="${APRUN}"

fi
2 changes: 1 addition & 1 deletion parm/config/gefs/config.base
Original file line number Diff line number Diff line change
Expand Up @@ -342,7 +342,7 @@ export DELETE_COM_IN_ARCHIVE_JOB="YES" # NO=retain ROTDIR. YES default in arc
export NUM_SND_COLLECTIVES=${NUM_SND_COLLECTIVES:-9}

# The tracker, genesis, and METplus jobs are not supported on CSPs yet
# TODO: we should place these in workflow/hosts/[csp]pw.yaml as part of AWS/AZURE/GOOGLE setup, not for general.
# TODO: we should place these in workflow/hosts/[aws|azure|google]pw.yaml as part of CSP's setup, not for general.
if [[ "${machine}" =~ "PW" ]]; then
export DO_WAVE="NO"
fi
Expand Down
3 changes: 3 additions & 0 deletions parm/config/gefs/config.resources
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@
export PARTITION_BATCH="compute"
max_tasks_per_node=36
;;
"AZUREPW")
Fixed Show fixed Hide fixed
export PARTITION_BATCH="compute"
max_tasks_per_node=24
"GOOGLEPW")
export PARTITION_BATCH="compute"
max_tasks_per_node=32
Expand Down
11 changes: 11 additions & 0 deletions parm/config/gefs/config.resources.AZUREPW
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#! /usr/bin/env bash

# AZURE-specific job resources

export is_exclusive="True"
unset memory

# shellcheck disable=SC2312
for mem_var in $(env | grep '^memory_' | cut -d= -f1); do
unset "${mem_var}"
done
2 changes: 1 addition & 1 deletion workflow/hosts/azurepw.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ CHGRP_RSTPROD: 'YES'
CHGRP_CMD: 'chgrp rstprod' # TODO: This is not yet supported.
HPSSARCH: 'NO'
HPSS_PROJECT: emc-global #TODO: See `ATARDIR` below.
BASE_CPLIC: '/bucket/global-workflow-shared-data/ICSDIR/prototype_ICs'
BASE_IC: '/bucket/global-workflow-shared-data/ICSDIR'
LOCALARCH: 'NO'
ATARDIR: '' # TODO: This will not yet work from AZURE.
MAKE_NSSTBUFR: 'NO'
Expand Down
Loading