Skip to content

[Bug]: SDK crashes when choices is None (provider-error payload) #604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cnm13ryan opened this issue Apr 25, 2025 · 7 comments · May be fixed by #609
Open

[Bug]: SDK crashes when choices is None (provider-error payload) #604

cnm13ryan opened this issue Apr 25, 2025 · 7 comments · May be fixed by #609
Labels
bug Something isn't working

Comments

@cnm13ryan
Copy link

cnm13ryan commented Apr 25, 2025

Please read this first


Describe the bug

When the upstream provider responds with an error payload ({"error": …}), the Agents SDK still returns a ChatCompletion-shaped object whose choices field is None.

openai_chatcompletions.py immediately dereferences response.choices[0] inside its debug logger, producing

TypeError: 'NoneType' object is not subscriptable

The exception is raised inside the SDK, before the calling application sees the provider error.


Debug information

Item Value
Agents SDK version v0.0.13
Python version 3.12.9
OS macOS 15.4 (arm64)

Repro steps

# mockup_main.py
from agents import Runner
from agents.models import OpenAIChatCompletionsModel

# A key / proxy that will reliably return {"error": …}
model = OpenAIChatCompletionsModel(
    model="gpt-3.5-turbo",
    api_key="sk-dummy",
    base_url="https://openrouter.ai/v1",
)

runner = Runner(model=model)

# ⇢ Raises TypeError in the SDK before we can handle the error payload
runner.run("Ping?")

Run:

export AGENTS_LOGGING_LEVEL=DEBUG   # optional, shows failing log line
python repro.py

Observed output

TypeError: 'NoneType' object is not subscriptable

(full traceback shows the line in openai_chatcompletions.py that
logs response.choices[0]).


Expected behavior

The SDK should detect that response.choices is missing and raise a clear domain-specific exception (e.g. ProviderError) containing the provider’s error message, instead of crashing with an internal TypeError.

All downstream code paths would then have a chance to handle the error or retry gracefully.

@cnm13ryan cnm13ryan added the bug Something isn't working label Apr 25, 2025
@davidlapsleyio
Copy link

davidlapsleyio commented Apr 25, 2025

@cnm13ryan I’m happy to create a PR if you point me to a contributors doc or let me know what you need from testing perspective. I’ll create PR from my fork. Assign the issue to me if you’d like me to do that. I can also check out other PRs to see what folks are doing. Thanks.

@cnm13ryan
Copy link
Author

@cnm13ryan I’m happy to create a PR if you point me to a contributors doc or let me know what you need from testing perspective and how to get my branch to you.

Not sure if there is a contributor doc as I cannot find it anywhere.

I have some ideas on the fixes. Let me polish that and get it back in a min.

@cnm13ryan
Copy link
Author

cnm13ryan commented Apr 25, 2025

Follow-up: Root-Cause Analysis (RCA) & Some Thoughts on Remediation Options and Success Criteria

1 · Context Recap

Goal Signal Metric (to know it’s fixed, my "wants")
Surface provider-side errors cleanly so callers can retry / log. TypeError: 'NoneType' object is not subscriptable raised inside openai_chatcompletions.py:get_response. Smoke test that forces a 429/401 response returns structured ProviderError; pipeline exits 0.

2 · Some Analysis

  • Crash happens when user provided in-complete OPENAI_BASE_URL details.
  • Crash happens inside SDK’s first debug log line; user code never sees the provider payload.
  • Implementation assumes every successful call has choices[0]; error payloads set choices = null.
  • Seek evidence | Line 79 dereferences response.choices[0] without a guard.
  • Identical stack-trace noted in issue Errors from custom model providers dont handled #380 (“Errors from custom model providers aren’t handled”).
  • Mis-scoping to user code leaves SDK unusable for any provider under rate-limit or failure.
  • Instead of “why is choices None?” ask “why does SDK log before validating the response object?”

3 · Root-Cause Hypotheses

# Hypothesis Confidence Evidence Test
H1 SDK logs response before validating, so choices may be None. High Code inspection & #380 trace. Reproduce with a rate-limited key.
H2 Hard-coded param (include_usage=True) triggers 422, hiding root error. Low Issue #442 (Mistral). Remove param; see if crash persists.
H3 Upstream branch already fixed the guard. Low Latest main still dereferences blindly. Grep repo for choices[0].

4. Root Cause likely to be

  • Missing guard for error payloads → internal TypeError masks genuine provider failure.

5 · Option Matrix

Option Fix description Pros Cons
A Early guard – if not response.choices: raise ProviderError(response.error) One-liner; unblocks all callers; no API change. Error surfaced only via exception, not event stream.
B Build full compat-layer that validates / normalises every response & chunk. Future-proofs SDK (see #442, #578, #601). Medium refactor; extra LOC & minimal overhead.

6 · Recommendation

Ship Option A immediately to stop crashes, then schedule Option B for robust, schema-tolerant handling.

Option A (Guard: provider may return an error object with choices == None)
# agents/models/openai_chatcompletions.py
+ if not getattr(response, "choices", None):
+    raise ProviderError(
+       f"LLM provider error: {getattr(response, 'error', 'unknown')}"
    )

Outcome: callers receive a clear, catchable ProviderError; negative token streaming or logging impact is zero; no public behaviour regressed.

@cnm13ryan
Copy link
Author

cnm13ryan commented Apr 25, 2025

Basically, to cut down all the jargon, one way this error arises is when the model service provider information are not configured "as is" to OpenAI chat completion schema by the user (i.e., some typos), but instead of raising the error in pointing the developer or user to the right direction, it just outputs the error mentioned which is not useful unless one digs.

Option A is just a quick implementation to raise errors. Option B goes further in terms of providing the compat-layer (separation concerns) to avoid these errors to permeate into domain logic.

@davidlapsleyio
Copy link

@cnm13ryan the repro script above did not work. It attempts to run the completion model as if it were an agent, which is incorrect. I created a new repro script below. This correctly reproduces the bug and validates the fix.

import asyncio
import os

from agents import AsyncOpenAI
from agents.model_settings import ModelSettings
from agents.models.interface import ModelTracing
from agents.models.openai_chatcompletions import OpenAIChatCompletionsModel


async def main():
    api_key = os.environ.get("OPENAI_API_KEY")
    client = AsyncOpenAI(
        api_key=api_key,
        base_url="https://openrouter.ai/v1",
    )

    # Pass the client into the model constructor
    model = OpenAIChatCompletionsModel(
        model="gpt-3.5-turbo",
        openai_client=client,
    )

    settings = ModelSettings(
        temperature=0.7,
        max_tokens=100,
    )

    result = await model.get_response(
        system_instructions="This is a bug repro test. You are a helpful assistant.",
        input="Ping?",
        model_settings=settings,
        tools=[],
        output_schema=None,
        handoffs=[],
        tracing=ModelTracing.DISABLED,
        previous_response_id=None,
    )

    return result


if __name__ == "__main__":
    result = asyncio.run(main())
    print(result)

@davidlapsleyio
Copy link

I created a PR with your short term fix here

I noticed some linting errors in legacy code but did not include those in the PR (lmk if you'd like me to change that).

LMK if there are any changes you'd like me to make to the PR.

@davidlapsleyio
Copy link

Btw, I'm happy to work on option B if you'd like. LMK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants