-
Notifications
You must be signed in to change notification settings - Fork 76
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #595 from yandthj/kestrel_arbiter
Kestrel arbiter
- Loading branch information
Showing
4 changed files
with
45 additions
and
44 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,12 +2,12 @@ | |
title: Transitioning from Eagle to Kestrel | ||
--- | ||
|
||
## Overview of steps | ||
## Overview of Steps | ||
|
||
This page is meant to provide all necessary information to transition a project from Eagle to Kestrel. Transitioning a project can be broken down into four steps: | ||
This page is meant to provide all necessary information to transition a project from Eagle to Kestrel. Transitioning a project can be broken down into five steps: | ||
|
||
1. Accessing Kestrel | ||
2. Moving your files from Eagle to Kestrel | ||
2. Transferring Data from Eagle to Kestrel | ||
3. Understanding the options for running your software on Kestrel | ||
|
||
a. How to check if your software is available as a module on Kestrel | ||
|
@@ -19,7 +19,7 @@ This page is meant to provide all necessary information to transition a project | |
4. Submitting your jobs on Kestrel | ||
5. Review performance recommendations if scalability or performance is worse than expected | ||
|
||
If you find yourself stuck on any of the above steps, please reach out to [email protected] as soon as possible. | ||
If you find yourself stuck on any of the above steps, please reach out to [[email protected]](mailto:[email protected]) as soon as possible. | ||
|
||
## 1. Accessing Kestrel | ||
|
||
|
@@ -37,15 +37,19 @@ ssh <your username>@kestrel.nrel.gov | |
``` | ||
For more detailed information on accessing Kestrel, please see [this page](./Kestrel/index.md). | ||
|
||
The filesystem structure of Kestrel is similar to Eagle. When you first log on, you will be in `/home/[your username]`. Your project directory can be found at `/projects/[allocation name]`. | ||
## 2. Transferring Data from Eagle to Kestrel | ||
|
||
## 2. Moving your files from Eagle to Kestrel | ||
|
||
Please see our page on [transferring files](../Managing_Data/Transferring_Files/index.md) for detailed information. Essentially, you should use the command-line `rsync` tool for small transfers (<100 GB), and Globus for large transfers. | ||
Please see our page on [transferring files](../Managing_Data/Transferring_Files/index.md) for detailed information. Essentially, you should use the command-line `rsync` tool for small transfers (<100 GB), and [Globus](../Managing_Data/Transferring_Files/globus.md) for large transfers. | ||
|
||
See our [Globus page](../Managing_Data/Transferring_Files/globus.md) for instructions on how to use Globus to transfer files between Eagle and Kestrel. | ||
### Filesystems | ||
|
||
Reach out to [email protected] if you run into issues while transferring files. | ||
Data storage polices and the filesystems layout on Kestrel is similar to Eagle. Kestrel has a **95 PB** ClusterStor Lustre file system. Unlike on Eagle, the Parallel Filesystem (PFS) consists of a ProjectFS and a ScratchFS which have different configurations. ScratchFS uses a Lustre file system in a hybrid flash-disk configuration providing a total of **27 petabytes** (PB) of capacity with **354 gigabytes (GB)/s** of IOR bandwidth. ProjectFS has **68 PB** of capacity with **200 GB/s** of IOR bandwidth. We advise running jobs out of `/scratch` and moving data to `/projects` for long term storage. Like on Eagle, `/scratch` will have a 28 day purge policy with no exceptions. | ||
|
||
The Home File System (HFS) on Kestrel is part of the ClusterStor used for PFS, providing highly reliable storage for user home directories and NREL-specific software. HFS will provide 1.2 PB of capacity. Snapshots of files on the HFS will be available up to 30 days after change/deletion. `/home` directories have a quota of 50 GB. | ||
|
||
|
||
Please see the [Kestrel Filesystem page](./Kestrel/filesystems.md) for more information. | ||
|
||
## 3. Understanding the options for running your software on Kestrel | ||
|
||
|
@@ -55,15 +59,15 @@ If you are used to using your software as an NREL-maintained module on Eagle, fi | |
|
||
`module avail [your software name]` | ||
|
||
If nothing shows up, please email [email protected] to get the module set up on Kestrel. | ||
If nothing shows up, please email [[email protected]](mailto:[email protected]) to get the module set up on Kestrel. | ||
|
||
If the module exists, then you simply need to `module load [your software name]`, the same as you would do on Eagle. | ||
|
||
### How to build your own software on Kestrel | ||
|
||
If you need to build your own software on Kestrel, and NOT use an already-existing module, then the steps can be a bit different than Eagle. For a general software-building procedure, please see our [Libraries How-To](../Development/Libraries/howto.md#summary-of-steps) tutorial. | ||
|
||
In general, on Kestrel we recommend using the `PrgEnv-cray` or `PrgEnv-intel` environments to build your code. For detailed descriptions on these environments, see our [environments](./Kestrel/Environments/index.md) page. For a tutorial walkthrough of building a simple code (IMB) within these environments, see our [environments tutorial](./Kestrel/Environments/tutorial.md) page. Note that `PrgEnv-` environments on Kestrel are different than environments on Eagle. Loading a `PrgEnv` loads a number of modules at once that together constitute a consistent environment. | ||
In general, on Kestrel we recommend using the `PrgEnv-cray` or `PrgEnv-intel` environments to build your code. For detailed descriptions on these environments, see our [Environments](./Kestrel/Environments/index.md) page. For a tutorial walkthrough of building a simple code (IMB) within these environments, see our [Environments Tutorial](./Kestrel/Environments/tutorial.md) page. Note that `PrgEnv-` environments on Kestrel are different than environments on Eagle. Loading a `PrgEnv` loads a number of modules at once that together constitute a consistent environment. | ||
|
||
!!! danger | ||
OpenMPI currently does not work well on Kestrel, and thus it is **strongly** recommended to NOT use OpenMPI. If you require assistance in building your code with an MPI other than | ||
|
@@ -72,36 +76,35 @@ In general, on Kestrel we recommend using the `PrgEnv-cray` or `PrgEnv-intel` en | |
!!! tip | ||
Some MPI codes, especially old legacy scientific software, may be difficult to build with Cray MPICH. In these cases, if it is possible to build the code with Intel MPI or a different MPICH implementation, then Cray MPICH can be utilized at run-time via use of the `cray-mpich-abi` module (note that OpenMPI is *NOT* an implementation of MPICH, and you cannot use the `cray-mpich-abi` if you built with OpenMPI). A detailed example of building with Intel MPI but running with Cray MPICH can be found on our [VASP application page](../Applications/vasp.md). | ||
|
||
## 4. Running your jobs on Kestrel | ||
## 4. Running your Jobs on Kestrel | ||
|
||
See our page on submitting jobs on Kestrel [here](./Kestrel/running.md). | ||
|
||
Submitting a job on Kestrel works much the same as submitting a job on Eagle. Both systems use the Slurm scheduler. If the application you wish to run can be found under our [Applications tab](../Applications/index.md), then there may be example Kestrel submit scripts on the application page. Otherwise, our [VASP documentation page](../Applications/vasp.md#vasp-on-kestrel) contains a variety of sample submit scripts that you can modify to fit your own purposes. | ||
Like Eagle, Kestrel uses the [Slurm job scheduler](../Slurm/index.md). If the application you need to run can be found under our [Applications tab](../Applications/index.md), then there may be example Kestrel submission scripts on the application page. Otherwise, our [VASP documentation page](../Applications/vasp.md#vasp-on-kestrel) contains a variety of sample submit scripts that you can modify to fit your own purposes. | ||
|
||
For information on the Kestrel hardware configuration, see our [Kestrel System Configuration](https://www.nrel.gov/hpc/kestrel-system-configuration.html) page. One key difference from Eagle is that not all of the Kestrel nodes have a local disk. If you need local disk space, you will need to request that in your job submission script with the `--tmp` option. For more detailed information on this, please see [this page](./Kestrel/filesystems.md#node-file-system). | ||
|
||
For information on the Kestrel hardware configuration, see our [Kestrel System Configuration](https://www.nrel.gov/hpc/kestrel-system-configuration.html) page. | ||
|
||
### Shared Partition | ||
|
||
Note that each Kestrel standard CPU node contains 104 CPU cores (and 256 GB memory). Some applications or application use-cases may not scale well to this many CPU cores. In these cases, it is recommended to submit your jobs to the shared partition. A job submitted to the shared partition will be charged AUs proportionate to whichever resource you require more of, between CPUs and memory. | ||
Note that each Kestrel standard CPU node contains 104 CPU cores and 256 GB memory. Some applications or application use-cases may not scale well to this many CPU cores. In these cases, it is recommended to submit your jobs to the shared partition. A job submitted to the shared partition will be charged AUs proportionate to whichever resource you require more of, between CPUs and memory. | ||
|
||
The following is an example shared-partition submit script using VASP: | ||
The following is an example shared-partition submission script: | ||
|
||
``` | ||
#!/bin/bash | ||
#SBATCH --nodes=1 | ||
#SBATCH --partition=shared | ||
#SBATCH --tasks=26 #How many cpus you want | ||
#SBATCH --mem-per-cpu=2G #Default is 1 GB/core but 2 GB/core is a good starting place for VASP | ||
#SBATCH --time=2:00:00 | ||
#SBATCH --account=<your-account-name> | ||
#SBATCH --job-name=<your-job-name> | ||
module load vasp/<version> | ||
srun vasp_std |& tee out | ||
#SBATCH --nodes=1 | ||
#SBATCH --partition=shared | ||
#SBATCH --time=2:00:00 | ||
#SBATCH --ntasks=26 # CPUs requested for job | ||
#SBATCH --mem-per-cpu=2000 # Request 2GB per core. | ||
#SBATCH --account=<allocation handle> | ||
cd /scratch/$USER | ||
srun ./my_progam # Use your application's commands here | ||
``` | ||
|
||
For more information on the shared partitions and an example AU-accounting calculation, see [here](./Kestrel/running.md#shared-node-partition). | ||
For more information on the shared partition and an example AU-accounting calculation, see [here](./Kestrel/running.md#shared-node-partition). | ||
|
||
## 5. Performance Recommendations | ||
|
||
|
@@ -139,17 +142,17 @@ These environment variables turn off some collective optimizations that we have | |
|
||
Please note that all of these recommendations are subject to change as we continue to improve the system. | ||
|
||
## 6. Kestrel release notes | ||
## Kestrel Release Notes | ||
|
||
Release notes for Kestrel after major upgrades can be found [here](./Kestrel/kestrel_release_notes.md). | ||
|
||
## 7. Resources | ||
## Resources | ||
|
||
1. [Accessing Kestrel](./Kestrel/index.md) | ||
2. [Transferring Files between Filesystems on the NREL Network](../Managing_Data/Transferring_Files/index.md) | ||
3. [Using Globus to move data from Eagle to Kestrel](../Managing_Data/Transferring_Files/globus.md) | ||
4. [General software building tutorial](../Development/Libraries/howto.md) | ||
5. [Environments Overview](./Kestrel/Environments/index.md) | ||
6. [Environments tutorial](./Kestrel/Environments/tutorial.md) | ||
6. [Environments Tutorial](./Kestrel/Environments/tutorial.md) | ||
|
||
Please reach out to [email protected] for assistance with any topic on this page. | ||
Please reach out to [[email protected]](mailto:[email protected]) for assistance with any topic on this page. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters