-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement apiToken failover mechanism #1256
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1256 +/- ##
==========================================
+ Coverage 35.91% 43.52% +7.61%
==========================================
Files 69 76 +7
Lines 11576 12320 +744
==========================================
+ Hits 4157 5362 +1205
+ Misses 7104 6622 -482
- Partials 315 336 +21 |
@cr7258 可以用SetSharedData同步一下,要注意用cas机制避免冲突,同时也可以基于SetSharedData机制进行选主,让一个worker做健康检查恢复,不过要注意SharedData中的数据是VM级别的,即使插件配置更新也不会清理。 |
@johnlanni 我修改了代码,使用 SetSharedData 在多个 VM 之间同步 apiToken 的信息,并且也使用 SetSharedData 进行选主了。
这个地方提到的注意点,我需要做那些处理? |
大的问题没有,上面提到一些跟机制相关的细节处理,辛苦再调整下 |
README.md 应该也要更新一下 |
…ures exceeds the threshold
@CH3CHO 我把调用的逻辑包装到 handleRequestHeaders 和 handleRequestBody 函数中了,每个 provider 在 OnRequestHeaders 和 OnRequestBody 中分别调用这两个函数即可。之所以没有抽到 main 函数中,是考虑到在处理 headers 或者 body 的前后不同的 provider 的逻辑有可能有些不一样。example qwen, example claude 在 handleRequestBody 中还对从文件中获取 context 这种统一的行为作为处理,每个 provider 不需要重复写 TransformRequestHeaders 和 TransformRequestBody 目前改为可选实现,如果没有实现 TransformRequestHeaders,不做任何修改,如何没有实现 TransformRequestBody,则只调用 defaultTransformRequestBody 方法做 model 映射。 上述修改已使用下面配置文件进行测试: apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
name: ai-proxy-groq
namespace: higress-system
spec:
matchRules:
- config:
provider:
type: groq
apiTokens:
- "<grop-token>"
- "sk-bad-groq"
modelMapping:
"*": llama3-8b-8192
context:
fileUrl: https://raw.githubusercontent.com/cr7258/test-context/refs/heads/main/README.md
serviceName: github.dns
servicePort: 443
failover:
enabled: true
failureThreshold: 3
successThreshold: 5
healthCheckModel: gpt-3
service:
- groq.dns
- config:
provider:
type: claude
apiTokens:
- "<claude-token>"
- "sk-bad-claude"
modelMapping:
gpt-3: claude-3-opus-20240229
"*": claude-3-sonnet-20240229
context:
fileUrl: https://raw.githubusercontent.com/cr7258/test-context/refs/heads/main/README.md
serviceName: github.dns
servicePort: 443
failover:
enabled: true
failureThreshold: 2
successThreshold: 9
healthCheckModel: gpt-3
service:
- claude.dns
- config:
provider:
type: qwen
apiTokens:
- "<qwen-token>"
- "sk-bad-qwen"
modelMapping:
gpt-3: qwen-turbo
"*": qwen-turbo
context:
fileUrl: https://raw.githubusercontent.com/cr7258/test-context/refs/heads/main/README.md
serviceName: github.dns
servicePort: 443
failover:
enabled: true
failureThreshold: 4
successThreshold: 7
healthCheckModel: gpt-3
service:
- qwen.dns
url: oci://cr7258/ai-proxy:failover-v86
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
higress.io/destination: |
30% claude.dns
30% groq.dns
40% qwen.dns
labels:
higress.io/resource-definer: higress
name: test-ai
namespace: higress-system
spec:
ingressClassName: higress
rules:
- host: test-ai.com
http:
paths:
- backend:
resource:
apiGroup: networking.higress.io
kind: McpBridge
name: default
path: /
pathType: Prefix
---
apiVersion: networking.higress.io/v1
kind: McpBridge
metadata:
name: default
namespace: higress-system
spec:
registries:
- domain: api.groq.com
name: groq
port: 443
type: dns
protocol: https
sni: api.groq.com
- domain: api.anthropic.com
name: claude
port: 443
type: dns
protocol: https
sni: api.anthropic.com
- domain: dashscope.aliyuncs.com
name: qwen
port: 443
type: dns
protocol: https
sni: dashscope.aliyuncs.com
- domain: raw.githubusercontent.com
name: github
port: 443
type: dns
protocol: https
sni: raw.githubusercontent.com 现在只对 qwen, grop, claude 这 3 个 provider 的代码做了对应的适配,如果没有其他问题的话,后面我把其他的 provider 也对应修改一下。 |
@johnlanni @CH3CHO 所有 provider 都已经调整完毕,另外有两个新的改动:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome
Ⅰ. Describe what this PR did
配置示例:
目前仅根据 HTTP 请求的响应状态码是否是 200 来判断 apiToken 是否可用,应该暂时用不到其他复杂的判断条件。
Ⅱ. Does this pull request fix one issue?
fixes #1227
Ⅲ. Why don't you add test cases (unit test/integration test)?
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews
Question
目前还有两个问题: