epic: Automated Testing for Built-in Models #56

Van-QA · 2024-08-12T08:40:53Z

Resources

#1125

Original Post

Focus team on basic functionality - e.g. install, inference call
Out-of-scope: model testing (for now)

Problem
To avoid manually test the end-to-end functionality of various models in the Hugging Face Cortex Hub. This process is time-consuming and prone to human error, leading to inconsistencies in testing results.

Success Criteria
I want to have an automated end-to-end testing framework set up for the most common models in the Hugging Face Cortex Hub. This framework should automatically run tests for the following models:

cortexso/llama3
cortexso/llama3.1
cortexso/gemma
cortexso/gemma2
cortexso/phi3
cortexso/mistral
cortexso/openhermes-2.5
cortexso/tinyllama
cortexso/qwen2

The tests should be executed either

on weekends or whenever there is a new release of the LlamaCPP version.

The results should be easily accessible and provide clear feedback on the models' performance and functionality.

Additional Context
Automating the testing process will not only save time but also ensure that any changes or updates to the models do not break existing functionality. It would be beneficial to integrate this testing with CI/CD pipelines to ensure that any new model versions are automatically tested before deployment.

hiento09 · 2024-10-30T09:22:08Z

List current models and quantization

llama3.2:3b-gguf-q8-0
llama3.2:3b-gguf-q6-k
llama3.2:3b-gguf-q5-km
llama3.2:3b-gguf-q5-ks
llama3.2:3b-gguf-q4-km
llama3.2:3b-gguf-q4-ks
llama3.2:3b-gguf-q3-kl
llama3.2:3b-gguf-q3-km
llama3.2:3b-gguf-q3-ks
llama3.2:3b-gguf-q2-k
llama3.1:gguf
llama3.1:8b-gguf
llama3.1:8b-gguf-q8-0
llama3.1:8b-gguf-q6-k
llama3.1:8b-gguf-q5-km
llama3.1:8b-gguf-q5-ks
llama3.1:8b-gguf-q4-km
llama3.1:8b-gguf-q4-ks
llama3.1:8b-gguf-q3-kl
llama3.1:8b-gguf-q3-km
llama3.1:8b-gguf-q3-ks
llama3.1:8b-gguf-q2-k
llama3.1:8b-onnx
llama3.1:onnx
tinyllama:gguf
tinyllama:1b-gguf
tinyllama:1b-gguf-q8-0
tinyllama:1b-gguf-q6-k
tinyllama:1b-gguf-q5-km
tinyllama:1b-gguf-q5-ks
tinyllama:1b-gguf-q4-km
tinyllama:1b-gguf-q4-ks
tinyllama:1b-gguf-q3-kl
tinyllama:1b-gguf-q3-km
tinyllama:1b-gguf-q3-ks
tinyllama:1b-gguf-q2-k
llama3:8b-gguf-q8-0
llama3:8b-gguf-q6-k
llama3:8b-gguf-q5-km
llama3:8b-gguf-q5-ks
llama3:8b-gguf-q4-km
llama3:8b-gguf-q4-ks
llama3:8b-gguf-q3-kl
llama3:8b-gguf-q3-km
llama3:8b-gguf-q3-ks
llama3:8b-gguf-q2-k
llama3:gguf
llama3:8b-gguf
llama3:onnx
llama3:tensorrt-llm-linux-ampere
llama3:tensorrt-llm-linux-ada
llama3:8b-tensorrt-llm-linux-ampere
llama3:8b-tensorrt-llm-linux-ada
llama3:tensorrt-llm-windows-ampere
llama3:tensorrt-llm-windows-ada
llama3:8b-tensorrt-llm-windows-ampere
llama3:8b-tensorrt-llm-windows-ada
phi3:mini-gguf
phi3:medium
phi3:mini-gguf-q8-0
phi3:mini-gguf-q6-k
phi3:mini-gguf-q5-km
phi3:medium-gguf
phi3:mini-gguf-q5-ks
phi3:mini-gguf-q4-km
phi3:mini-gguf-q4-ks
phi3:mini-gguf-q3-kl
phi3:mini-gguf-q3-km
phi3:mini-gguf-q3-ks
phi3:mini-gguf-q2-k
phi3:gguf
phi3:medium-onnx
phi3:mini-onnx
phi3:onnx
gemma2:gguf
gemma2:2b-gguf
gemma2:2b-onnx
gemma2:onnx
gemma:gguf
gemma:7b-gguf
gemma:onnx
gemma:7b-onnx
mistral:small-gguf-q8-0
mistral:small-gguf-q6-k
mistral:small-gguf-q5-km
mistral:small-gguf-q5-ks
mistral:small-gguf-q4-km
mistral:small-gguf-q4-ks
mistral:small-gguf-q3-kl
mistral:small-gguf-q3-km
mistral:small-gguf-q3-ks
mistral:small-gguf-q2-k
mistral:7b-v0.3-gguf-q8-0
mistral:7b-v0.3-gguf-q6-k
mistral:7b-v0.3-gguf-q5-km
mistral:7b-v0.3-gguf-q5-ks
mistral:7b-v0.3-gguf-q4-km
mistral:7b-v0.3-gguf-q4-ks
mistral:7b-v0.3-gguf-q3-kl
mistral:7b-v0.3-gguf-q3-km
mistral:7b-v0.3-gguf-q3-ks
mistral:7b-v0.3-gguf-q2-k
mistral:gguf
mistral:7b-gguf
mistral:7b-tensorrt-llm-linux-ada
mistral:tensorrt-llm-linux-ada
mistral:7b-tensorrt-llm-linux-ampere
mistral:tensorrt-llm-linux-ampere
mistral:7b-tensorrt-llm-windows-ada
mistral:7b-tensorrt-llm-windows-ampere
mistral:tensorrt-llm-windows-ampere
mistral:tensorrt-llm-windows-ada
mistral:onnx
mistral:7b-onnx
mistral-nemo:12b-gguf-q8-0
mistral-nemo:12b-gguf-q6-k
mistral-nemo:12b-gguf-q5-km
mistral-nemo:12b-gguf-q5-ks
mistral-nemo:12b-gguf-q4-km
mistral-nemo:12b-gguf-q4-ks
mistral-nemo:12b-gguf-q3-kl
mistral-nemo:12b-gguf-q3-km
mistral-nemo:12b-gguf-q3-ks
mistral-nemo:12b-gguf-q2-k
qwen2:gguf
qwen2:7b-gguf
codestral:gguf
codestral:22b-gguf
openhermes-2.5:gguf
openhermes-2.5:7b-gguf
openhermes-2.5:7b-tensorrt-llm-linux-ada
openhermes-2.5:tensorrt-llm-linux-ada
openhermes-2.5:tensorrt-llm-linux-ampere
openhermes-2.5:7b-tensorrt-llm-linux-ampere
openhermes-2.5:tensorrt-llm-windows-ampere
openhermes-2.5:tensorrt-llm-windows-ada
openhermes-2.5:7b-tensorrt-llm-windows-ampere
openhermes-2.5:7b-tensorrt-llm-windows-ada
openhermes-2.5:onnx
openhermes-2.5:7b-onnx
aya:gguf
aya:12.9b-gguf
yi-1.5:gguf
yi-1.5:34B-gguf
mixtral:gguf
mixtral:7x8b-gguf
command-r:gguf
command-r:35b-gguf

hiento09 · 2024-10-30T09:22:57Z

I will write test for default :gguf quantization since now we don't have enough resource to test all 148 model quantizations
cc @dan-homebrew @gabrielle-ong @0xSage @vansangpfiev @nguyenhoangthuan99 @namchuai

Van-QA added the type: feature request A new feature label Aug 12, 2024

Van-QA assigned vansangpfiev Aug 12, 2024

Van-QA mentioned this issue Aug 30, 2024

Discussion: Automated QA Strategy janhq/jan#3506

Closed

imtuyethan transferred this issue from janhq/cortex.llamacpp Sep 2, 2024

Van-QA changed the title ~~feat: Automate e2e testing for common models in Hugging Face Cortexso Hub~~ test: Automate e2e testing for common models in Hugging Face Cortexso Hub Sep 6, 2024

dan-homebrew changed the title ~~test: Automate e2e testing for common models in Hugging Face Cortexso Hub~~ test: Automate e2e testing for cortex.cpp install and inference request on major platforms Sep 6, 2024

dan-homebrew changed the title ~~test: Automate e2e testing for cortex.cpp install and inference request on major platforms~~ test: Automated e2e testing for cortex.cpp install and inference request on major platforms Sep 6, 2024

dan-homebrew changed the title ~~test: Automated e2e testing for cortex.cpp install and inference request on major platforms~~ test: Automated Testing for Built-in Models Sep 8, 2024

0xSage changed the title ~~test: Automated Testing for Built-in Models~~ epic: Automated Testing for Built-in Models Sep 23, 2024

dan-homebrew assigned hiento09 and unassigned vansangpfiev Oct 29, 2024

hiento09 mentioned this issue Oct 31, 2024

Feat e2e test cortexso hub janhq/cortex.cpp#1590

Merged

3 tasks

dan-homebrew transferred this issue from janhq/cortex.cpp Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epic: Automated Testing for Built-in Models #56

epic: Automated Testing for Built-in Models #56

Van-QA commented Aug 12, 2024 •

edited by dan-homebrew

Loading

hiento09 commented Oct 30, 2024

hiento09 commented Oct 30, 2024 •

edited

Loading

epic: Automated Testing for Built-in Models #56

epic: Automated Testing for Built-in Models #56

Comments

Van-QA commented Aug 12, 2024 • edited by dan-homebrew Loading

Resources

Original Post

hiento09 commented Oct 30, 2024

hiento09 commented Oct 30, 2024 • edited Loading

Van-QA commented Aug 12, 2024 •

edited by dan-homebrew

Loading

hiento09 commented Oct 30, 2024 •

edited

Loading