This repository is where SquareFactory develops ClusterFactory, the Kubernetes-based infrastructure orchestrator together with the community. ClusterFactory brings together best-in-class solutions from the HPC, Cloud, and DevOps industries to manage a cluster in a declarative way in combination with the GitOps practice.
- Production-ready vanilla upstream Kubernetes
- Easy deploy, backup, restore, and update with cfctl
- Scalable from single node to large, high-available clusters
- GitOps-enabled with ArgoCD and Sealed Secrets
- VM workloads with KubeVirt
- Bare-metal workloads with Slurm
- Bare-metal provisioning with Grendel
- Supports CNI plugins with Multus CNI
- TLS/SSL certificates management with cert-manager
- Mirror of DeepSquare's software library (end user software) by using CVMFS Stratum 1
- A web-based HPC user portal Open Ondemand
- Monitoring stack (Grafana, Prometheus with ready-to-use exporters)
- Easiest way to join the DeepSquare Grid
If you'd like to try ClusterFactory, you should start by reading our Quick Start Guide and our documentation!
- Community Discord - Request for support and help from the ClusterFactory community.
- GitHub Issues - Submit your issues and feature requests via GitHub.
We welcome your help in building ClusterFactory! If you are interested, we invite you to check out the Contributing Guide.
ClusterFactory makes the process of deploying a full fledge HPC cluster and join the DeepSquare Grid fast and easy. We believe that flexibity, repeatability, availability and ease of use should be prioritized for managing and scaling HPC clusters.
ClusterFactory has been developed to be:
- Performance-oriented: Integrates a key-in-hand HPC stack including Slurm, MPI, DFS, etc.
- Highly configurable: With Helm, all configuration is done in a single
values.yaml
file. - Repeatable: With Argo CD following GitOps practices, all states are specified declaratively and saved in a Git repository.
- Highly available: With Kubernetes, container scheduling is automatically ensured and easy to set up.
- Simple: A single descriptive YAML per application, with Argo CD to automatically updates the application.
- Long-term maintainability: Easy to deploy, update, backup and restore with K0s.
- Kubernetes Documentation (not going to lie, you're gonna need it)
- Helm Values Files
- K0s Configuration
- Cert-Manager Issuers Configuration
- Multus CNI Quickstart
- CNI Plugins Overview
- KubeVirt User Guide
- Argo CD Application YAML
- Traefik Ingress Routes
- Traefik Ingress
See the LICENSE file.