diff --git a/docs/admin/runai-setup/config/ha.md b/docs/admin/runai-setup/config/ha.md index 7a68b540e8..2368a0321b 100644 --- a/docs/admin/runai-setup/config/ha.md +++ b/docs/admin/runai-setup/config/ha.md @@ -48,4 +48,4 @@ The default Prometheus installation uses a single pod replica. If the node runni [Prometheus supports](https://prometheus.io/docs/introduction/faq/#can-prometheus-be-made-highly-available){target=_blank} high availability by allowing to run multiple instances. The tradeoff of this approach is that all instances will scrape and send the same data. The Run:ai control plane will identify duplicate metric series and ignore them. This approach will thus increase network, CPU and memory consumption. -To change the number of Prometheus instances, edit the `runaiconfig` as described under [customizing the Run:ai cluster](../cluster-setup/customize-cluster-install.md). Under `prometheus`, set `replicas` to 2. +To change the number of Prometheus instances, edit the `runaiconfig` as described under [customizing the Run:ai cluster](../cluster-setup/customize-cluster-install.md). Under `prometheus.spec`, set `replicas` to 2. diff --git a/docs/home/whats-new-2-15.old b/docs/home/whats-new-2-15.old deleted file mode 100644 index 75e9f8d184..0000000000 --- a/docs/home/whats-new-2-15.old +++ /dev/null @@ -1,75 +0,0 @@ -# Run:ai version 2.15 - December 3, 2023 - -## New Features - -* Added the ability to download a CSV file from all pages that contain a table. Downloading a CSV can provide a snapshot of the page's history over the course of time, and help with compliance tracking. All the columns that are selected (displayed) in the table will be downloaded to the file. -* Added support for `restricted` policy for [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/){target=_blank} (PSA) on OpenShift only. For more information, see [Pod security admission](../admin/runai-setup/cluster-setup/cluster-prerequisites.md#pod-security-admission). -* Added a new button to the *Jobs* page to switch the view to *Workloads* (feature preview). *Workloads* is a new view for jobs that are running in the platform. The *Workloads* view provides a more advanced UI than the previous *Jobs* UI. The new table format provides: - - * Improved views of the data - * Improved filters and search - * More information - - For more information see [Workloads](../admin/workloads/workload-overview-admin.md#workloads-view). - -* Added a new dashboard for CPU based environments. The dashboards display specific information for CPU based nodes, node-pools, clusters, or tenants. These dashboards also include additional metrics that are specific to CPU based environments. This will help optimize visual information eliminating the views of empty GPU dashlets. For more information see [CPU Dashboard](../admin/admin-ui-setup/dashboard-analysis.md#cpu-dashboard). -* Added the ability to prevent the submission of workloads that use data sources of type `host path` using policies. This prevents data from being stored on the node. When a node is deleted, all data stored on that node is lost. For configuration information, see [Prevent Data Storage on the Node](../admin/workloads/policies.md#prevent-data-storage-on-the-node). - -* Added the ability to configure strict GPU compute time slicing. This gives workloads the exact GPU compute portion based on the requested GPU fraction (GPU Memory Fraction). This creates complete transparency and predictability of the amount of resources (Compute, Memory, etc.) a workload will get from a GPU. For more information, see [GPU Time Slicing](../Researcher/scheduling/GPU-time-slicing-scheduler.md). -* New cluster installation. The new installation no longer requires downloading and customizing a *values file*. Cluster configurations are preserved during upgrade and are performed using the `runaiconfig` file which creates a separation between installation related flags and cluster customization flags. For more information, see [Customize cluster installation](../admin/runai-setup/cluster-setup/customize-cluster-install.md). -* New cluster wizard for adding and installing new clusters to your system. - ---8<-- "home/whats-new-2-14.md:6:8" ---8<-- "home/whats-new-2-14.md:14:16" ---8<-- "home/whats-new-2-14.md:18:20" ---8<-- "home/whats-new-2-14.md:29:31" ---8<-- "home/whats-new-2-14.md:33:35" ---8<-- "home/whats-new-2-14.md:49:56" - - -## Improvements - -* Improved the readability of the node table to include a more detailed status and its description. The added information in the table helps to easily inspect issues that may impact resource availability in the cluster. For more information, see [Node and Node Pool Status](../Researcher/scheduling/using-node-pools.md#node-and-node-pool-status). -* Improved the Consumption report interface by moving the Cost settings to the *General* settings menu. -* Improved *Credentials* creation. Now, a Run:ai scope can be added to credentials. For more information, see [Credentials](../admin/admin-ui-setup/credentials-setup.md). -* Added support for workload types when creating a new or editing and existing environment. Select from `single-node` or `multi-node (distributed)` workloads. The environment is available only on feature forms which are relevant to the workload type selected. -* Improved support for Kubeflow Notebooks. Now Run:ai supports scheduling of Kubeflow notebook CRDs with fractional GPUs. Kubeflow notebooks are identified automatically and use a special icon in the *Jobs* UI. -* Improved control over how over-quota is managed by adding the ability to block over-subscription of quota in *Projects* or *Departments*. For more information, see [Over quota blocking](../Researcher/scheduling/the-runai-scheduler.md#limit-quota-over-or-under-subscription). -* Improved the fairness for departments using the `over quota priority` switch (in Settings). When the feature flag is disabled, over-quota weights are equal to deserved quota and any excess resources are divided in the same proportion as the in-quota resources. For more information, see [Over Quota Priority](../Researcher/scheduling/the-runai-scheduler.md#over-quota-priority). -* Added support to run distributed workloads via training. You can configure distributed training on the following: - - * Trainings form - * Environments form - - You can select `single` or `multi-node (distributed)` training. When configuring distributed training, you will need to select a framework from the list. Supported frameworks now include: - - * PyTorch - * Tensorflow - * XGBoost - * MPI - - For *Trainings* configuration, see [Adding trainings](../Researcher/user-interface/trainings.md#adding-trainings). See your Run:ai representative to enable this feature. For *Environments* configuration, see [Creating an Environment](../Researcher/user-interface/workspaces/create/create-env.md#creating-a-new-environment). - -* Run:ai can be installed in an isolated network. In this air-gapped configuration, the organization will not be using an established root certificate authority but a local certificate authority. This allows inserting the local certificate authority (CA) as a part of the Run:ai installation so it is reconized by all Run:ai services. For more information, see [Working with a Local Certificate Authority](../admin/runai-setup/config/org-cert.md). -* Added the ability, in OpenShift environments, to configure the certificate to be used in the cluster routes created by Run:ai, instead of using the OpenShift certificate. For more information, see the table entry [Dedicated certificate for the researcher service route](../admin/runai-setup/cluster-setup/customize-cluster-install.md#configurations). -* Updated the compatibility matrix to include supported versions for Kubernetes and OpenShift. For more information, see [Cluster prerequisites](../admin/runai-setup/cluster-setup/cluster-prerequisites.md#kubernetes). - ---8<-- "home/whats-new-2-14.md:45:47" - -* Improvement in node pools which are now enabled by default. There is no need to enable the feature in the settings. -* Improved the *Trainings* and *Workspaces* forms. Now the runtime field for *Command* and *Arguments* can be edited even after it has inherited it from the environment. - -* Added support for *Scope* in the template form. For configuration information, see [Creating templates](../admin/admin-ui-setup/templates.md#creating-templates). -* Improved support for assets that appear unusable. Assets that are greyed out now have a button on the cards when the item does not comply with a configured policy. The button displays information about which policies are non-compliant and will require a correction to enable the asset. - ---8<-- "home/whats-new-2-14.md:52:55" - -## Fixed issues - -| Internal ID | Description | -| ---------------------------- | ---- | - -## Known issues - -| Internal ID | Description | -| ---------------------------- | ---- |