Skip to content

Commit

Permalink
wip: [kserve] Add granite-3-0-8b-instruct vLLM single-model gating
Browse files Browse the repository at this point in the history
  • Loading branch information
sjmonson committed Nov 13, 2024
1 parent 6617357 commit 1c9b044
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 0 deletions.
5 changes: 5 additions & 0 deletions projects/kserve/testing/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,11 @@ ci_presets:
testing:
size: small
max_concurrency: 512
- name: granite-3-0-8b-instruct
model: granite-3.0-8b-instruct
testing:
size: small
max_concurrency: 512
tests.e2e.llm_load_test.args.concurrency: [1, 2, 4, 8, 16, 32, 64, 96, 128, 192, 256, 384, 512]

# ---
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namePrefix: granite-3-0-8b-instruct-

resources:
- ../../base

patches:
- path: patch.yaml
target:
kind: InferenceService
options:
allowNameChange: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: isvc
spec:
predictor:
minReplicas: 1
model:
storageUri: s3://psap-hf-models/ibm-granite/granite-3.0-8b-instruct/
resources:
requests:
cpu: "2"
memory: "16Gi"
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"

0 comments on commit 1c9b044

Please sign in to comment.