You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/generative-apis/troubleshooting/fixing-common-issues.mdx
+41Lines changed: 41 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,19 @@ dates:
13
13
14
14
Below are common issues that you may encounter when using Generative APIs, their causes, and recommended solutions.
15
15
16
+
## 400: Bad Request - You exceeded maximum context window for this model
17
+
18
+
### Cause
19
+
- You provided an input exceeding the maximum context window (also known as context length) for the model you are using.
20
+
- You provided a long input and requested a long input (in `max_completion_tokens` field), which added together, exceed the maximum context window of the model you are using.
21
+
22
+
### Solution
23
+
- Reduce your input size below what is [supported by the model](/generative-apis/reference-content/supported-models/).
24
+
- Use a model supporting longer context window values.
25
+
- Use [Managed Inference](/managed-inference/), where the context window can be increased for [several configurations with additional GPU vRAM](/managed-inference/reference-content/supported-models/). For instance, `llama-3.3-70b-instruct` model in `fp8` quantization can be served with:
26
+
-`15k` tokens context window on `H100` instances
27
+
-`128k` tokens context window on `H100-2` instances.
28
+
16
29
## 403: Forbidden - Insufficient permissions to access the resource
17
30
18
31
### Cause
@@ -27,6 +40,34 @@ Below are common issues that you may encounter when using Generative APIs, their
27
40
- The URL format is: `https://api.scaleway.ai/{project_id}/v1"`
28
41
- If no `project_id` is specified in the URL (`https://api.scaleway.ai/v1"`), your `default` Project will be used.
29
42
43
+
## 416: Range Not Satisfiable - max_completion_tokens is limited for this model
44
+
45
+
### Cause
46
+
- You provided a value for `max_completion_tokens` that is too high and not supported by the model you are using.
47
+
48
+
### Solution
49
+
- Remove `max_completion_tokens` field from your request or client library, or reduce its value below what is [supported by the model](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/).
50
+
- As an example, when using the [init_chat_model from Langchain](https://python.langchain.com/api_reference/_modules/langchain/chat_models/base.html#init_chat_model), you should edit the `max_tokens` value in the following configuration:
- Use a model supporting higher `max_completion_tokens` value.
55
+
- Use [Managed Inference](/managed-inference/), where these limits on completion tokens do not apply (your completion tokens amount will still be limited by the maximum context window supported by the model).
56
+
57
+
## 416: Range Not Satisfiable - max_completion_tokens is limited for this model
58
+
59
+
### Cause
60
+
- You provided `max_completion_tokens` value too high, that is not supported by the model you are using.
61
+
62
+
### Solution
63
+
- Remove the `max_completion_tokens` field from your request or client library, or reduce its value below what is [supported by the model](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/).
64
+
- As an example, when using the [init_chat_model from Langchain](https://python.langchain.com/api_reference/_modules/langchain/chat_models/base.html#init_chat_model), you should edit the `max_tokens` value in the following configuration:
- Use a model supporting a higher `max_completion_tokens` value.
69
+
- Use [Managed Inference](/managed-inference/), where these limits on completion tokens do not apply (your completion tokens amount will still be limited by the maximum context window supported by the model).
70
+
30
71
## 429: Too Many Requests - You exceeded your current quota of requests/tokens per minute
0 commit comments