Skip to content

Commit 8c34ba3

Browse files
fpagnybene2k1RoRoJ
authored
feat(genapi): update troubleshooting section on context window and co… (#4858)
* feat(genapi): update troubleshooting section on context window and completion tokens * Apply suggestions from code review Co-authored-by: Rowena Jones <[email protected]> --------- Co-authored-by: Benedikt Rollik <[email protected]> Co-authored-by: Rowena Jones <[email protected]>
1 parent 6feddc9 commit 8c34ba3

File tree

1 file changed

+41
-0
lines changed

1 file changed

+41
-0
lines changed

pages/generative-apis/troubleshooting/fixing-common-issues.mdx

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,19 @@ dates:
1313

1414
Below are common issues that you may encounter when using Generative APIs, their causes, and recommended solutions.
1515

16+
## 400: Bad Request - You exceeded maximum context window for this model
17+
18+
### Cause
19+
- You provided an input exceeding the maximum context window (also known as context length) for the model you are using.
20+
- You provided a long input and requested a long input (in `max_completion_tokens` field), which added together, exceed the maximum context window of the model you are using.
21+
22+
### Solution
23+
- Reduce your input size below what is [supported by the model](/generative-apis/reference-content/supported-models/).
24+
- Use a model supporting longer context window values.
25+
- Use [Managed Inference](/managed-inference/), where the context window can be increased for [several configurations with additional GPU vRAM](/managed-inference/reference-content/supported-models/). For instance, `llama-3.3-70b-instruct` model in `fp8` quantization can be served with:
26+
- `15k` tokens context window on `H100` instances
27+
- `128k` tokens context window on `H100-2` instances.
28+
1629
## 403: Forbidden - Insufficient permissions to access the resource
1730

1831
### Cause
@@ -27,6 +40,34 @@ Below are common issues that you may encounter when using Generative APIs, their
2740
- The URL format is: `https://api.scaleway.ai/{project_id}/v1"`
2841
- If no `project_id` is specified in the URL (`https://api.scaleway.ai/v1"`), your `default` Project will be used.
2942

43+
## 416: Range Not Satisfiable - max_completion_tokens is limited for this model
44+
45+
### Cause
46+
- You provided a value for `max_completion_tokens` that is too high and not supported by the model you are using.
47+
48+
### Solution
49+
- Remove `max_completion_tokens` field from your request or client library, or reduce its value below what is [supported by the model](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/).
50+
- As an example, when using the [init_chat_model from Langchain](https://python.langchain.com/api_reference/_modules/langchain/chat_models/base.html#init_chat_model), you should edit the `max_tokens` value in the following configuration:
51+
```python
52+
llm = init_chat_model("llama-3.3-70b-instruct", max_tokens="8000", model_provider="openai", base_url="https://api.scaleway.ai/v1", temperature=0.7)
53+
```
54+
- Use a model supporting higher `max_completion_tokens` value.
55+
- Use [Managed Inference](/managed-inference/), where these limits on completion tokens do not apply (your completion tokens amount will still be limited by the maximum context window supported by the model).
56+
57+
## 416: Range Not Satisfiable - max_completion_tokens is limited for this model
58+
59+
### Cause
60+
- You provided `max_completion_tokens` value too high, that is not supported by the model you are using.
61+
62+
### Solution
63+
- Remove the `max_completion_tokens` field from your request or client library, or reduce its value below what is [supported by the model](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/).
64+
- As an example, when using the [init_chat_model from Langchain](https://python.langchain.com/api_reference/_modules/langchain/chat_models/base.html#init_chat_model), you should edit the `max_tokens` value in the following configuration:
65+
```python
66+
llm = init_chat_model("llama-3.3-70b-instruct", max_tokens="8000", model_provider="openai", base_url="https://api.scaleway.ai/v1", temperature=0.7)
67+
```
68+
- Use a model supporting a higher `max_completion_tokens` value.
69+
- Use [Managed Inference](/managed-inference/), where these limits on completion tokens do not apply (your completion tokens amount will still be limited by the maximum context window supported by the model).
70+
3071
## 429: Too Many Requests - You exceeded your current quota of requests/tokens per minute
3172

3273
### Cause

0 commit comments

Comments
 (0)