Epic: Add New Providers Configuration Block For Fine Grained Controls #1185

da-moon · 2025-02-11T02:02:14Z

da-moon
Feb 11, 2025

Context

To enable finer control over how we connect to LLM providers—and to introduce
the capability to throttle API calls—we propose a new, unified providers
configuration block. This block will group all provider-specific parameters
(such as name, type, api_base, temperature, top_p, cache,
max_tokens, and additional_headers) and include a new configurable rate
limiting mechanism.

For this epic, our focus is on implementing a "simple" rate
limiting strategy; while we will document a token bucket approach for future
work, that remains out of scope.

I wanted to start a discussion around this and with your blessings , I can get started on this change

Value

Centralization: Consolidate all provider settings into a single
configuration block.
Reliability: Reduce rate limit errors by allowing users to set API call
and token usage thresholds.
User Empowerment: Enable users to fine-tune provider interactions
according to their quota limits.
Incremental Delivery: Break the work into small, atomic PRs so that value
can be delivered and merged quickly.
Foundation for Future Work: Establish a baseline for advanced rate
limiting (e.g. token bucket) in future iterations.

Acceptance Criteria

New Providers Configuration Block:
- Introduce a unified providers section containing:
  - Provider base settings: name, type, api, temperature, top_p,
    cache, max_tokens, and additional_headers.
  - A new ratelimit sub-section (initially supporting the "simple"
    strategy).
Configuration Parser Updates:
- Extend the configuration parser to support and validate the new providers
  block.
- Maintain backward compatibility with existing configuration keys.
Simple Rate Limiter Implementation:
- Implement a simple rate limiting mechanism that enforces limits on requests
  and tokens per period.
- Integrate a retry mechanism (with configurable delay and max retries) for
  handling rate limit errors.
- Apply this logic within the provider API request workflow.
Testing and Documentation:
- Develop unit tests to cover the new configuration block and rate limiting
  behavior.
- Update project documentation with an example YAML configuration (see
  Additional Context below).
- Document the token bucket approach as a potential future enhancement.

Measurement

Reduction in rate limit errors during high-volume operations.
Increased stability and success rate of provider API calls.
Positive feedback in issue discussions, PR reviews, and user reports.

Persona(s)

Goose Users: Especially those interfacing with high-volume LLM providers.
DevOps/Engineering Teams: Teams requiring predictable and controlled API
usage.
Contributors: Developers interested in enhancing configuration
flexibility and system robustness.

In Scope

PR 1:
- Introduce a new providers configuration element that groups all
  provider-specific parameters into a single block.
- Include fields for basic settings: name, type, api, temperature,
  top_p, cache, max_tokens, and additional_headers.
PR 2:
- Update the configuration parser to recognize and validate the new
  providers block.
- Ensure legacy parameters (e.g. GOOSE_PROVIDER) are either mapped or
  properly deprecated.
PR 3:
- Add new configuration fields under providers to support the simple rate
  limiting mechanism.
- This includes the following sub-fields within ratelimit:
  - type: simple
  - config.requests (with limit and period_seconds)
  - config.tokens (with limit and period_seconds)
  - config.retry (with delay_seconds and max_retries)
PR 4:
- Implement the simple rate limiting logic and integrate it into the provider
  API request workflow.
PR 5:
- Develop unit tests for both the new providers configuration and the simple
  rate limiting mechanism.
PR 6:
- Update the documentation with a comprehensive example YAML configuration.
- Document the token bucket approach as a future enhancement (out of scope
  for this epic).

Out of Scope

Implementation of alternative rate limiting strategies (e.g., token bucket,
sliding window).
Dynamic adjustments based on real-time API response headers.
Automatic fallback to alternative providers upon rate limit errors.
Modifications to existing extension configurations outside the new
providers block.

Complexities

Maintaining backward compatibility with the current configuration scheme.
Coordinating changes across the configuration parser, API request workflow,
and rate limiting logic.
Ensuring that the retry and delay mechanisms are effective without
introducing significant performance penalties.

Additional Context

Below is an example YAML configuration that combines existing configuration
elements with the new providers block and a simple rate limiting mechanism:

GOOSE_PROVIDER: anthropic
GOOSE_MODEL: claude-3-5-sonnet-latest
extensions:
  computercontroller:
    enabled: true
    name: computercontroller
    type: builtin
  developer:
    enabled: true
    name: developer
    type: builtin
# NOTE: proposed changes
providers:
  - name: "claude-3-5-sonnet-latest" # Name has to be unique. matches `GOOSE_MODEL`
    type: "anthropic" # We can have other types such as openai ; the type would enable us to create multiple providers, e.g have backup providers from openrouter or we can add other openai compliant LLM inference api providers such as perplexity; it matches GOOSE_PROVIDER
    api: "https://api.anthropic.com/v1/messages" # allow customization of base url which can help in cases such as using openai compliant endpoints
    temperature: 0.3 # enable experimentation and fine-tuning
    top_p: 0.9 # enable experimentation and fine-tuning
    cache: false # control prompt caching
    max_tokens: 8192 # allow overriding max tokens
    additional_headers: # enables addition of headers to help with bleeding edge features that are not integrated in code;
      anthropic-version: 2023-06-01
    ratelimit: # setup rate-limiting
      type: simple
      config:
        requests:
          limit: 80 # Maximum API calls per period
          period_seconds: 60 # Time window in seconds (i.e. 1 minute)
        tokens:
          limit: 80000 # Maximum input tokens per period
          period_seconds: 60
        retry:
          delay_seconds: 60 # Wait time upon a rate-limit error
          max_retries: 3
  - name: "claude-openrouter"
    provider: "openai"
    api: "https://openrouter.ai/api/v1/chat/completions"
    model: "anthropic/claude-3.5-sonnet:beta"
    cache: false
    max_tokens: 8192
    ratelimit:
      type: token_bucket
      config:
        capacity: 80 # Maximum burst capacity (i.e. tokens available)
        refill_rate: 80 # Number of tokens added per period (e.g. per minute)
        period_seconds: 60 # The time window for refilling tokens
        retry:
          delay_seconds: 60
          max_retries: 3

da-moon · 2025-02-11T02:02:55Z

da-moon
Feb 11, 2025
Author

The PR List is just an initial draft; I think adding these changes would actually need many more PRs, assuming we limit the scope of the PR for ease of review.
Issues such as the following can be addressed here
- Handle API Rate Limits sensibly #887

0 replies

lily-de · 2025-02-14T16:24:33Z

lily-de
Feb 14, 2025
Maintainer

Hey! This is a great idea -- thanks for laying it all out so neatly.

I think you should go for it, but I would ask that this provider-specific configuration live in a file separate from config.yaml for now. The main reason is we are planning to use that file as a source of truth between the CLI and UI, and we would want provider-specific configuration to live elsewhere for now (I've pinged a couple people on the team about this, so they might weigh in separately, but this seems to be the consensus for now). We wouldn't plan at this time to support configuring this in the UI, but having it in a separate file would make it easier in the future as well I think.

1 reply

alexhancock Feb 14, 2025
Maintainer

Plus one to what @lily-de said!

I think keeping the number of required configuration values as slim as possible is good. So if most of these things are possible to configure via a file but optional, I think it's good.

martjay · 2025-02-15T02:31:16Z

martjay
Feb 15, 2025

If you want to program, it sometimes needs to execute the end point command by itself, such as npm run. At this time, you should first kill all node.exe processes and then execute the startup command, so that it will not keep running repeatedly and occupy port 3000.

0 replies

da-moon · 2025-02-18T12:08:51Z

da-moon
Feb 18, 2025
Author

I was thinking about adding imports; It makes things complicated at the moment; I will include that but that would be pretty much lower in priority; my primary focus is to deliver small, incremental wins that can make tangible impact on my personal workflow and others as well ; ergonomic features that help but are not mandatory in my opinion ( like having splitted config files ) would be good to have but not now.

0 replies

da-moon · 2025-02-18T12:09:57Z

da-moon
Feb 18, 2025
Author

@lily-de I will outline my main challenges in the following comments , each in one message so that we can discuss them one by one

0 replies

da-moon · 2025-02-18T12:11:59Z

da-moon
Feb 18, 2025
Author

Challenge 1

Goose sends a LOT of tokens. I keep getting rate limited . I am on tier 5 on OpenAI and tier 4 on Anthropic yet this still plagues me . I am sure others are facing this.

I have a few ideas about smart truncation and compression of chat message thread ; I have been looking into some approaches but for now, the quickest win here is to set up rate limiter with a simple strategy.

0 replies

da-moon · 2025-02-18T12:25:44Z

da-moon
Feb 18, 2025
Author

Challenge 2

I have found that when coding , I needed to

make morels behave more deterministically and have them follow instructions more closer
Having models "spazz" out less and give answer grounded in well known truths (i.e their training set) and lower their "confidence"

I want the model to search the internet when there are slightly unsure ; I don't want them to make up facts that sound sane but doesn't actually work; They do this often with using external libraries. They make function calls or import packages that just don't exist
They should prompt the user when there are ambiguities and ask for clarification; think of it as how chatgpt's new Deep Research asks follow up questions after the initial prompt

After experimenting with pretty much all models ( O3 included ) Claude Sonnet is still the best one when it comes to Agentic workflows. O1-Pro is the only model that surpasses it but you have to use it through chatgpt as far as I know.

The issue/advantage of Claude Sonnet he is very opinionated ; He will do what he "thinks" is best to achieve what he was able to "understand" from your prompt; sometimes even blatantly ignoring direct instructions in the system prompt ( this behavior becomes more emergent when tokens in the context window increases ).

In my experiments, the best way to make models follow specific instructions more closely and prevent them from spazzing out is through

a. setting temperature to 0.1-0.3 . Experimenting with other parameters helps as well. This is really a case-by-case basis. Sometimes ( for instance when documenting new approaches / action plans ) you want to have a higher temprature so the models give more "creative" answers ... answers that were generated rahter than being close to their training set
b. Using a very concise system prompt that LLMs find "interesting" ; something that triggers their "attention". Using formalized notations and a DSL that is not in a natural language such as english tends to work best ; (Formalized Mathematical Modeling of Git Workflows)[https://github.com//discussions/1106] is one of my earlier works here. I have a few others that force better "reasoning" to classic LLMs but this conversations topic is not suitable for discussing this matter

1 reply

lily-de Feb 18, 2025
Maintainer

So it sounds like here, what you're saying is giving the user the ability to vary the temperature on the fly can be helpful at getting the model to perform the way you'd like it to, whether that's behaving more deterministically or being more or less opinionated in their answers

I'm not sure what you're thinking about with point B -- would it be making the system prompt editable?

da-moon · 2025-02-18T12:38:48Z

da-moon
Feb 18, 2025
Author

Challenge 3

Only supporting well-known , larger LLM inference providers is very limiting in option while virtually every provider supports OpenAI compliant inference endpoints.

I have found myself relying on more niche LLMs such as Perplexity's Sonar Reasoning Pro ; I also self host models on remote servers ( Not Local ollama ).

Adding option to fine tune based on provider type can have a few advantages

You can have multiple models of the same provider each with fine-tuned parameters for specific usecase. as an example, lets assume we have the following models

claude-creative : Claude Sonnet with temperature set to 0.8 , Max Tokens set to 4096 to save some money on tokens , targeting the official Anthropic endpoint upstream URL
claude-conservative : Claude Sonnet with temperature set to 0.1 , Caching set to off targeting the official Anthropic endpoint upstream URL

this would enable more fine-tuned workflows; in my example

claude-creative : will be used for discussions and planning development activity, hence using parameters to make him more likely to generate novel ideas
claude-conservative : using parameters that would make it work more deterministically and give answers that are more likely to be working as I don't want it to , for instance, use an external library / function call that just doesn't exist>

Setup backups and contingency models; e.g when Anthropic ratelimits you , you can switch to Claude hosted by openrouter>
Use inference APIs compliant with OpenAI ; e.g models such as Perplexity or R1 (with tool call finetune) hosted on personal runpod account.

0 replies

da-moon · 2025-02-18T12:41:43Z

da-moon
Feb 18, 2025
Author

@lily-de @alexhancock if it is OK with you, I will lay out a more detailed incremental implementation plan:

I will create issues on Github with more details. My focus on each issue would be on the smallest deliverable feature . This should make PR reviews quicker and easier
I will update this discussion with links to all the issues for easier tracking
We will discuss each action items in it's dedicated issue to get full alignment ; then with your blessings I will start implementing

1 reply

lily-de Feb 18, 2025
Maintainer

sounds good!

lily-de · 2025-02-18T15:06:42Z

lily-de
Feb 18, 2025
Maintainer

@da-moon -- all those ideas sound great to me! I think the best implementation, given that this is somewhat experimental, is one where people can opt-in, and the default goose flow is left as is. As we experiment with these parameters and rate limiting, if we find one that performs really well, we can make that the default experience for users

1 reply

da-moon Feb 18, 2025
Author

We can have a build flag option . For experimental features , they can build it from source with the appropriate flag

lily-de · 2025-02-19T01:00:09Z

lily-de
Feb 19, 2025
Maintainer

Hey @da-moon -- I wanted to make sure we all get aligned before you spend time implementing anything. Right now the team is working to overhaul config.yaml a bit so maintaining state and configuration between the CLI and UI is as seamless as possible.

To that end, I've started this discussion topic here that I'd love if you could look at and provide any feedback. This is the main reason I asked that the model-specific config lived in a different file, but if you see a way forward with a single config file with a strong reason to do so let us know!

2 replies

da-moon Feb 19, 2025
Author

I will look at it tomorrow.
I am completely aligned with you on splitting files. I hate reading large files but I think there might be edge cases and challenges with proper implementation of import functionality ...

my fear of imports comes from my experience with Just . I used Just ever since its infancy ... back when I was using it in my workflows , Just did not have imports ... i had Justfiles with +1200 lines ... I remember I wrote some makeshift scripts to aggregate smaller Justfiles into one sp that i could write it in a more modular fashion ... that didn't work so well at Scale and I ended up abandoning just and went back to using Make , go-task and Earthly but I digress ... I think it took Just a year or so after i stopped using it to add imports ; if it took just such a long time for them to implement it , I think it might take this project a longer time than initially expected , hence I wanted to work on imports just after easier wins.

Off top of my head , imports might have recursion issues , there might be nested import where you get circular dependencies, how to deal with key collisions and so on ...

The happy path of implementing import is straightforward , the edge cases are what can potentially add asymmetric time costs.

da-moon Feb 19, 2025
Author

@lily-de the way you weaved sentences in that discussion is nothing short of masterful ... it's just that right now , I my cognitive faculties is not able to process all the nuances that you raised there ; i will digest try to digest it at a better time and I would try to add suggestions , even thought what you wrote looks impeccable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Add New Providers Configuration Block For Fine Grained Controls #1185

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 11 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Epic: Add New Providers Configuration Block For Fine Grained Controls #1185

da-moon Feb 11, 2025

Context

Value

Acceptance Criteria

Measurement

Persona(s)

In Scope

Out of Scope

Complexities

Additional Context

Replies: 11 comments · 6 replies

da-moon Feb 11, 2025 Author

lily-de Feb 14, 2025 Maintainer

alexhancock Feb 14, 2025 Maintainer

martjay Feb 15, 2025

da-moon Feb 18, 2025 Author

da-moon Feb 18, 2025 Author

da-moon Feb 18, 2025 Author

Challenge 1

da-moon Feb 18, 2025 Author

Challenge 2

lily-de Feb 18, 2025 Maintainer

da-moon Feb 18, 2025 Author

Challenge 3

da-moon Feb 18, 2025 Author

lily-de Feb 18, 2025 Maintainer

lily-de Feb 18, 2025 Maintainer

da-moon Feb 18, 2025 Author

lily-de Feb 19, 2025 Maintainer

da-moon Feb 19, 2025 Author

da-moon Feb 19, 2025 Author

da-moon
Feb 11, 2025

Replies: 11 comments 6 replies

da-moon
Feb 11, 2025
Author

lily-de
Feb 14, 2025
Maintainer

alexhancock Feb 14, 2025
Maintainer

martjay
Feb 15, 2025

da-moon
Feb 18, 2025
Author

da-moon
Feb 18, 2025
Author

da-moon
Feb 18, 2025
Author

da-moon
Feb 18, 2025
Author

lily-de Feb 18, 2025
Maintainer

da-moon
Feb 18, 2025
Author

da-moon
Feb 18, 2025
Author

lily-de Feb 18, 2025
Maintainer

lily-de
Feb 18, 2025
Maintainer

da-moon Feb 18, 2025
Author

lily-de
Feb 19, 2025
Maintainer

da-moon Feb 19, 2025
Author

da-moon Feb 19, 2025
Author