Skip to content

Commit 3ea7ddf

Browse files
bene2k1nerda-codesjcirinosclwyRoRoJ
authored
feat(infr): add scaling (#5057)
* feat(infr): add scale documentation * feat(infr): update wording * Apply suggestions from code review Co-authored-by: Néda <[email protected]> * Apply suggestions from code review Co-authored-by: Jessica <[email protected]> * feat(infr): update file name * fix(inf): add beta info re node number --------- Co-authored-by: Néda <[email protected]> Co-authored-by: Jessica <[email protected]> Co-authored-by: Rowena <[email protected]> Co-authored-by: Rowena Jones <[email protected]>
1 parent 8451b60 commit 3ea7ddf

File tree

5 files changed

+56
-3
lines changed

5 files changed

+56
-3
lines changed

menu/navigation.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -900,6 +900,10 @@
900900
"label": "Monitor a deployment",
901901
"slug": "monitor-deployment"
902902
},
903+
{
904+
"label": "Configure autoscaling",
905+
"slug": "configure-autoscaling"
906+
},
903907
{
904908
"label": "Manage allowed IP addresses",
905909
"slug": "manage-allowed-ips"

pages/managed-inference/concepts.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,10 @@ LLMs are advanced artificial intelligence systems capable of understanding and g
7171
These models, such as Llama-3, are trained on vast amounts of data to learn the patterns and structures of language, enabling them to generate coherent and contextually relevant responses to queries or prompts.
7272
LLMs have applications in natural language processing, text generation, translation, and other tasks requiring sophisticated language understanding and production.
7373

74+
## Node number
75+
76+
The node number (or node count) defines the number of nodes, or Instances, that are running your Managed Inference deployment. [Increasing the node number](/managed-inference/how-to/configure-autoscaling/) scales your deployment, so that it can handle more load.
77+
7478
## Prompt
7579

7680
In the context of generative AI models, a prompt refers to the input provided to the model to generate a desired response.
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
meta:
3+
title: How to scale Managed Inference deployments
4+
description: This page explains how to scale Managed Inference deployments in size
5+
content:
6+
h1: How to manage scale Managed Inference deployments
7+
paragraph: This page explains how to scale Managed Inference deployments in size
8+
tags: managed-inference ai-data ip-address
9+
dates:
10+
validation: 2025-06-03
11+
posted: 2025-06-03
12+
categories:
13+
- ai-data
14+
---
15+
16+
You can scale your Managed Inference deployment up or down to match it to the incoming load of your deployment.
17+
18+
<Message type="important">
19+
This feature is currently in [Public Beta](https://www.scaleway.com/betas/).
20+
</Message>
21+
22+
<Macro id="requirements" />
23+
24+
- A Scaleway account logged into the [console](https://console.scaleway.com)
25+
- A [Managed Inference deployment](/managed-inference/quickstart/)
26+
- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization
27+
28+
## How to scale a Managed Inference deployment in size
29+
30+
1. Click **Managed Inference** in the **AI** section of the [Scaleway console](https://console.scaleway.com) side menu. A list of your deployments displays.
31+
2. Click a deployment name or <Icon name="more" /> > **More info** to access the deployment dashboard.
32+
3. Click the **Settings** tab and navigate to the **Scaling** section.
33+
4. Click **Update node count** and adjust the number of nodes in your deployment.
34+
<Message type="note">
35+
High availability is only guaranteed with two or more nodes.
36+
</Message>
37+
5. Click **Update node count** to update the number of nodes in your deployment.
38+
<Message type="note">
39+
Your deployment will be unavailable for 15-30 minutes while the node update is in progress.
40+
</Message>

pages/managed-inference/how-to/create-deployment.mdx

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,16 @@ dates:
2929
</Message>
3030
- Choose the geographical **region** for the deployment.
3131
- Specify the GPU Instance type to be used with your deployment.
32-
5. Enter a **name** for the deployment, and optional tags.
33-
6. Configure the **network connectivity** settings for the deployment:
32+
5. Choose the number of nodes for your deployment. Note that this feature is currently in [Public Beta](https://www.scaleway.com/betas/).
33+
<Message type="note">
34+
High availability is only guaranteed with two or more nodes.
35+
</Message>
36+
6. Enter a **name** for the deployment, and optional tags.
37+
7. Configure the **network connectivity** settings for the deployment:
3438
- Attach to a **Private Network** for secure communication and restricted availability. Choose an existing Private Network from the drop-down list, or create a new one.
3539
- Set up **Public connectivity** to access resources via the public internet. Authentication by API key is enabled by default.
3640
<Message type="important">
3741
- Enabling both private and public connectivity will result in two distinct endpoints (public and private) for your deployment.
3842
- Deployments must have at least one endpoint, either public or private.
3943
</Message>
40-
7. Click **Deploy model** to launch the deployment process. Once the model is ready, it will be listed among your deployments.
44+
8. Click **Deploy model** to launch the deployment process. Once the model is ready, it will be listed among your deployments.

pages/managed-inference/quickstart.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ Here are some of the key features of Scaleway Managed Inference:
4444
</Message>
4545
- Choose the geographical **region** for the deployment.
4646
- Specify the GPU Instance type to be used with your deployment.
47+
- Choose the number of nodes for your deployment. Note that this feature is currently in [Public Beta](https://www.scaleway.com/betas/).
4748
5. Enter a **name** for the deployment, along with optional tags to aid in organization.
4849
6. Configure the **network** settings for the deployment:
4950
- Enable **Private Network** for secure communication and restricted availability within Private Networks. Choose an existing Private Network from the drop-down list, or create a new one.

0 commit comments

Comments
 (0)