Releases: oracle-quickstart/oci-hpc
Releases · oracle-quickstart/oci-hpc
v2.11.0
What's Changed
- Slurm Update (24.05.1)
- Slurm nodes are dynamic instead of pre-defined (+ custom hostnames)
- Multi MT FSS deployment (Using DNS Round Robin)
- New monitoring solution (Prometheus, Node exporter, Can be run on a separate instance)
- Fix to slurm Healthchecks
- New shapes added, MI300X, L40S, A100 VMs,....
- OpenLDAP fix for OL8
- Meshpinger added (Used to validate RDMA connectivity between all RDMA NICs on all hosts)
- NVME's (Switch from LVM to MDADM)
- New OL8 images (OCA hold)
- Bug Fixes
v2.10.6
What's Changed
- Add Healthchecks for GPU nodes in Slurm (Idle nodes and at job start)
- Scratch from NVMe not default
- OCI ALgo Tuner example for A100
- GPU and RDMA monitoring turned on
- New images
- Bug fixes
v2.10.5
What's Changed
- Rename bastion to controller
- Added Slurm names to DNS
- Enable BIOS change
- New images
- Fix Compute Cluster to use Compute Agent
v2.10.4.1
What's Changed
- Quick fix for Modified Hashicorp repos
v2.10.4
What's Changed
- Support for Ubuntu 22.04 and OL8 on GPU nodes, update of all images
- Support for Oracle Cloud Agent for RDMA auth
- Support for the H100, E5 std and E5 HPC
- Update Slurm to 23.02.5-1
- Add automatic backup of bastion boot volume
v2.10.3
What's Changed
- Support for OL8 on bastion
- Support for compute Clusters
- Add GPU monitoring
- Support for Hyperthreading of 256 threads+ nodes in SLURM
- Add IB Write tests
- Mount multiple disks as one (with or without redundancy)
- Bug Fixes and improvements
v2.10.2.1
What's Changed
- Ubuntu support for PAM
- Update of oci-cn-auth in case the image has outdated one
- Update some default variables.
v2.10.2
What's Changed
- Updated to Slurm 23.02 (Which remove the need for node ordering in large GPU clusters)
- Updated marketplace images for OL7, OL8, and GPU with version 2.1.4 of the OCI authentication packages (Needed for better perf for GPU clusters).
- Fixed LDAP on Ubuntu
- Added the option to mount all NVMe's as separate Namespaces or One Logical volume (With or without redundancy)
- Added Hyperthreading for Ubuntu BMs
- Support for PMIx in Slurm
- Fix a Slurm bug due to long Rack IDs
- Other Small bug fixes
v2.10.1.1
What's Changed
Fix bug about bastion and login node Flex Shapes
v2.10.1
What's Changed
- Slurm User limits and PAM
- Updated marketplace images for OL7, OL8 and GPU with latest drivers.
- Support for the upcoming E5, A10 VMs and Dense.E4.Flex
- Add the ability to run a login node separate from the bastion.
- OCI provider version to 4.112.0
- Other Small bug fixes