Releases · meta-llama/llama-stack

24 Jan 17:47

ashwinb

v0.1.0

19521cb

v0.1.0 Latest

Latest

We are excited to announce a stable API release of Llama Stack, which enables developers to build RAG applications and Agents using tools and safety shields, monitor and those agents with telemetry, and evaluate the agent with scoring functions.

Context

GenAI application developers need more than just an LLM - they need to integrate tools, connect with their data sources, establish guardrails, and ground the LLM responses effectively. Currently, developers must piece together various tools and APIs, complicating the development lifecycle and increasing costs. The result is that developers are spending more time on these integrations rather than focusing on the application logic itself. The bespoke coupling of components also makes it challenging to adopt state-of-the-art solutions in the rapidly evolving GenAI space. This is particularly difficult for open models like Llama, as best practices are not widely established in the open.

Llama Stack was created to provide developers with a comprehensive and coherent interface that simplifies AI application development and codifies best practices across the Llama ecosystem. Since our launch in September 2024, we have seen a huge uptick in interest in Llama Stack APIs by both AI developers and from partners building AI services with Llama models. Partners like Nvidia, Fireworks, and Ollama have collaborated with us to develop implementations across various APIs, including inference, memory, and safety.

With Llama Stack, you can easily build a RAG agent which can also search the web, do complex math, and custom tool calling. You can use telemetry to inspect those traces, and convert telemetry into evals datasets. And with Llama Stack’s plugin architecture and prepackage distributions, you choose to run your agent anywhere - in the cloud with our partners, deploy your own environment using virtualenv, conda, or Docker, operate locally with Ollama, or even run on mobile devices with our SDKs. Llama Stack offers unprecedented flexibility while also simplifying the developer experience.

Release

After iterating on the APIs for the last 3 months, today we’re launching a stable release (V1) of the Llama Stack APIs and the corresponding llama-stack server and client packages(v0.1.0). We now have automated tests for providers. These tests make sure that all provider implementations are verified. Developers can now easily and reliably select distributions or providers based on their specific requirements.

There are example standalone apps in llama-stack-apps.

Key Features of this release

Unified API Layer
- Inference: Run LLM models
- RAG: Store and retrieve knowledge for RAG
- Agents: Build multi-step agentic workflows
- Tools: Register tools that can be called by the agent
- Safety: Apply content filtering and safety policies
- Evaluation: Test model and agent quality
- Telemetry: Collect and analyze usage data and complex agentic traces
- Post Training ( Coming Soon ): Fine tune models for specific use cases
Rich Provider Ecosystem
- Local Development: Meta's Reference, Ollama
- Cloud: Fireworks, Together, Nvidia, AWS Bedrock, Groq, Cerebras
- On-premises: Nvidia NIM, vLLM, TGI, Dell-TGI
- On-device: iOS and Android support
Built for Production
- Pre-packaged distributions for common deployment scenarios
- Backwards compatibility across model versions
- Comprehensive evaluation capabilities
- Full observability and monitoring
Multiple developer interfaces
- CLI: Command line interface
- Python SDK
- Swift iOS SDK
- Kotlin Android SDK
Sample llama stack applications
- Python
- iOS
- Android

What's Changed

[4/n][torchtune integration] support lazy load model during inference by @SLR722 in #620
remove unused telemetry related code for console by @dineshyv in #659
Fix Meta reference GPU implementation by @ashwinb in #663
Fixed imports for inference by @cdgamarose-nv in #661
fix trace starting in library client by @dineshyv in #655
Add Llama 70B 3.3 to fireworks by @aidando73 in #654
Tools API with brave and MCP providers by @dineshyv in #639
[torchtune integration] post training + eval by @SLR722 in #670
Fix post training apis broken by torchtune release by @SLR722 in #674
Add missing venv option in --image-type by @terrytangyuan in #677
Removed unnecessary CONDA_PREFIX env var in installation guide by @terrytangyuan in #683
Add 3.3 70B to Ollama inference provider by @aidando73 in #681
docs: update evals_reference/index.md by @eltociear in #675
[remove import ][1/n] clean up import & in apis/ by @yanxi0830 in #689
[bugfix] fix broken vision inference, change serialization for bytes by @yanxi0830 in #693
Minor Quick Start documentation updates. by @derekslager in #692
[bugfix] fix meta-reference agents w/ safety multiple model loading pytest by @yanxi0830 in #694
[bugfix] fix prompt_adapter interleaved_content_convert_to_raw by @yanxi0830 in #696
Add missing "inline::" prefix for providers in building_distro.md by @terrytangyuan in #702
Fix failing flake8 E226 check by @terrytangyuan in #701
Add missing newlines before printing the Dockerfile content by @terrytangyuan in #700
Add JSON structured outputs to Ollama Provider by @aidando73 in #680
[#407] Agents: Avoid calling tools that haven't been explicitly enabled by @aidando73 in #637
Made changes to readme and pinning to llamastack v0.0.61 by @heyjustinai in #624
[rag evals][1/n] refactor base scoring fn & data schema check by @yanxi0830 in #664
[Post Training] Fix missing import by @SLR722 in #705
Import from the right path by @SLR722 in #708
[#432] Add Groq Provider - chat completions by @aidando73 in #609
Change post training run.yaml inference config by @SLR722 in #710
[Post training] make validation steps configurable by @SLR722 in #715
Fix incorrect entrypoint for broken llama stack run by @terrytangyuan in #706
Fix assert message and call to completion_request_to_prompt in remote:vllm by @terrytangyuan in #709
Fix Groq invalid self.config reference by @aidando73 in #719
support llama3.1 8B instruct in post training by @SLR722 in #698
remove default logger handlers when using libcli with notebook by @dineshyv in #718
move DataSchemaValidatorMixin into standalone utils by @yanxi0830 in #720
add 3.3 to together inference provider by @yanxi0830 in #729
Update CODEOWNERS - add sixianyi0721 as the owner by @sixianyi0721 in #731
fix links for distro by @yanxi0830 in #733
add --version to llama stack CLI & /version endpoint by @yanxi0830 in #732
agents to use tools api by @dineshyv in #673
Add X-LlamaStack-Client-Version, rename ProviderData -> Provider-Data by @ashwinb in #735
Check version incompatibility by @ashwinb in #738
Add persistence for localfs datasets by @VladOS95-cyber in #557
Fixed typo in default VLLM_URL in remote-vllm.md by @terrytangyuan in #723
Consolidating Memory tests under client-sdk by @vladimirivic in #703
Expose LLAMASTACK_PORT in cli.stack.run by @terrytangyuan in #722
remove conflicting default for tool prompt format in chat completion by @dineshyv in #742
rename LLAMASTACK_PORT to LLAMA_STACK_PORT for consistency with other env vars by @raghotham in #744
Add inline vLLM inference provider to regression tests and fix regressions by @frreiss in #662
[CICD] github workflow to push nightly package to testpypi by @yanxi0830 in #734
Replaced zrangebylex method in the range method by @che...

Contributors

ashwinb, ehhuang, and 22 other contributors

Assets 2

22 Jan 22:24

ashwinb

v0.1.0rc12

82d942b

v0.1.0rc12 Pre-release

Pre-release

What's Changed

[4/n][torchtune integration] support lazy load model during inference by @SLR722 in #620
remove unused telemetry related code for console by @dineshyv in #659
Fix Meta reference GPU implementation by @ashwinb in #663
Fixed imports for inference by @cdgamarose-nv in #661
fix trace starting in library client by @dineshyv in #655
Add Llama 70B 3.3 to fireworks by @aidando73 in #654
Tools API with brave and MCP providers by @dineshyv in #639
[torchtune integration] post training + eval by @SLR722 in #670
Fix post training apis broken by torchtune release by @SLR722 in #674
Add missing venv option in --image-type by @terrytangyuan in #677
Removed unnecessary CONDA_PREFIX env var in installation guide by @terrytangyuan in #683
Add 3.3 70B to Ollama inference provider by @aidando73 in #681
docs: update evals_reference/index.md by @eltociear in #675
[remove import ][1/n] clean up import & in apis/ by @yanxi0830 in #689
[bugfix] fix broken vision inference, change serialization for bytes by @yanxi0830 in #693
Minor Quick Start documentation updates. by @derekslager in #692
[bugfix] fix meta-reference agents w/ safety multiple model loading pytest by @yanxi0830 in #694
[bugfix] fix prompt_adapter interleaved_content_convert_to_raw by @yanxi0830 in #696
Add missing "inline::" prefix for providers in building_distro.md by @terrytangyuan in #702
Fix failing flake8 E226 check by @terrytangyuan in #701
Add missing newlines before printing the Dockerfile content by @terrytangyuan in #700
Add JSON structured outputs to Ollama Provider by @aidando73 in #680
[#407] Agents: Avoid calling tools that haven't been explicitly enabled by @aidando73 in #637
Made changes to readme and pinning to llamastack v0.0.61 by @heyjustinai in #624
[rag evals][1/n] refactor base scoring fn & data schema check by @yanxi0830 in #664
[Post Training] Fix missing import by @SLR722 in #705
Import from the right path by @SLR722 in #708
[#432] Add Groq Provider - chat completions by @aidando73 in #609
Change post training run.yaml inference config by @SLR722 in #710
[Post training] make validation steps configurable by @SLR722 in #715
Fix incorrect entrypoint for broken llama stack run by @terrytangyuan in #706
Fix assert message and call to completion_request_to_prompt in remote:vllm by @terrytangyuan in #709
Fix Groq invalid self.config reference by @aidando73 in #719
support llama3.1 8B instruct in post training by @SLR722 in #698
remove default logger handlers when using libcli with notebook by @dineshyv in #718
move DataSchemaValidatorMixin into standalone utils by @yanxi0830 in #720
add 3.3 to together inference provider by @yanxi0830 in #729
Update CODEOWNERS - add sixianyi0721 as the owner by @sixianyi0721 in #731
fix links for distro by @yanxi0830 in #733
add --version to llama stack CLI & /version endpoint by @yanxi0830 in #732
agents to use tools api by @dineshyv in #673
Add X-LlamaStack-Client-Version, rename ProviderData -> Provider-Data by @ashwinb in #735
Check version incompatibility by @ashwinb in #738
Add persistence for localfs datasets by @VladOS95-cyber in #557
Fixed typo in default VLLM_URL in remote-vllm.md by @terrytangyuan in #723
Consolidating Memory tests under client-sdk by @vladimirivic in #703
Expose LLAMASTACK_PORT in cli.stack.run by @terrytangyuan in #722
remove conflicting default for tool prompt format in chat completion by @dineshyv in #742
rename LLAMASTACK_PORT to LLAMA_STACK_PORT for consistency with other env vars by @raghotham in #744
Add inline vLLM inference provider to regression tests and fix regressions by @frreiss in #662
[CICD] github workflow to push nightly package to testpypi by @yanxi0830 in #734
Replaced zrangebylex method in the range method by @cheesecake100201 in #521
Improve model download doc by @SLR722 in #748
Support building UBI9 base container image by @terrytangyuan in #676
update notebook to use new tool defs by @dineshyv in #745
Add provider data passing for library client by @dineshyv in #750
[Fireworks] Update model name for Fireworks by @benjibc in #753
Consolidating Inference tests under client-sdk tests by @vladimirivic in #751
Consolidating Safety tests from various places under client-sdk by @vladimirivic in #699
[CI/CD] more robust re-try for downloading testpypi package by @yanxi0830 in #749
[#432] Add Groq Provider - tool calls by @aidando73 in #630
Rename ipython to tool by @ashwinb in #756
Fix incorrect Python binary path for UBI9 image by @terrytangyuan in #757
Update Cerebras docs to include header by @henrytwo in #704
Add init files to post training folders by @SLR722 in #711
Switch to use importlib instead of deprecated pkg_resources by @terrytangyuan in #678
[bugfix] fix streaming GeneratorExit exception with LlamaStackAsLibraryClient by @yanxi0830 in #760
Fix telemetry to work on reinstantiating new lib cli by @dineshyv in #761
[post training] define llama stack post training dataset format by @SLR722 in #717
add braintrust to experimental-post-training template by @SLR722 in #763
added support of PYPI_VERSION in stack build by @jeffxtang in #762
Fix broken tests in test_registry by @vladimirivic in #707
Fix fireworks run-with-safety template by @vladimirivic in #766
Free up memory after post training finishes by @SLR722 in #770
Fix issue when generating distros by @terrytangyuan in #755
Convert SamplingParams.strategy to a union by @hardikjshah in #767
[CICD] Github workflow for publishing Docker images by @yanxi0830 in #764
[bugfix] fix llama guard parsing ContentDelta by @yanxi0830 in #772
rebase eval test w/ tool_runtime fixtures by @yanxi0830 in #773
More idiomatic REST API by @dineshyv in #765
add nvidia distribution by @cdgamarose-nv in #565
bug fixes on inference tests by @sixianyi0721 in #774
[bugfix] fix inference sdk test for v1 by @yanxi0830 in #775
fix routing in library client by @dineshyv in https://github.com/meta-llama/llama-stack/pull...

Contributors

ashwinb, jeffxtang, and 19 other contributors

Assets 2

18 Dec 07:17

ashwinb

v0.0.63

d6fcdef

v0.0.63

A small but important bug-fix release to update the URL datatype for the client-SDKs. The issue affected multimodal agentic turns especially.

Full Changelog: v0.0.62...v0.0.63

Assets 2

18 Dec 02:39

ashwinb

v0.0.62

eea4786

v0.0.62

What's Changed

A few important updates some of which are backwards incompatible. You must update your run.yamls when upgrading. As always look to templates/<distro>/run.yaml for reference.

Make embedding generation go through inference by @dineshyv in #606
[/scoring] add ability to define aggregation functions for scoring functions & refactors by @yanxi0830 in #597
Update the "InterleavedTextMedia" type by @ashwinb in #635
[NEW!] Experimental post-training APIs! #540, #593, etc.

A variety of fixes and enhancements. Some selected ones:

[#342] RAG - fix PDF format in vector database by @aidando73 in #551
add completion api support to nvidia inference provider by @mattf in #533
add model type to APIs by @dineshyv in #588
Allow using an "inline" version of Chroma using PersistentClient by @ashwinb in #567
[docs] add playground ui docs by @yanxi0830 in #592
add colab notebook & update docs by @yanxi0830 in #619
[tests] add client-sdk pytests & delete client.py by @yanxi0830 in #638
[bugfix] no shield_call when there's no shields configured by @yanxi0830 in #642

New Contributors

@SLR722 made their first contribution in #540
@iamarunbrahma made their first contribution in #636

Full Changelog: v0.0.61...v0.0.62

Contributors

ashwinb, mattf, and 5 other contributors

Assets 2

10 Dec 20:50

yanxi0830

v0.0.61

e2054d5

v0.0.61

What's Changed

add NVIDIA NIM inference adapter by @mattf in #355
Tgi fixture by @dineshyv in #519
fixes tests & move braintrust api_keys to request headers by @yanxi0830 in #535
allow env NVIDIA_BASE_URL to set NVIDIAConfig.url by @mattf in #531
move playground ui to llama-stack repo by @yanxi0830 in #536
fix[documentation]: Update links to point to correct pages by @sablair in #549
Fix URLs to Llama Stack Read the Docs Webpages by @JeffreyLind3 in #547
Fix Zero to Hero README.md Formatting by @JeffreyLind3 in #546
Guide readme fix by @raghotham in #552
Fix broken Ollama link by @aidando73 in #554
update client cli docs by @dineshyv in #560
reduce the accuracy requirements to pass the chat completion structured output test by @mattf in #522
removed assertion in ollama.py and fixed typo in the readme by @wukaixingxp in #563
Cerebras Inference Integration by @henrytwo in #265
unregister API for dataset by @sixianyi0721 in #507
[llama stack ui] add native eval & inspect distro & playground pages by @yanxi0830 in #541
Telemetry API redesign by @dineshyv in #525
Introduce GitHub Actions Workflow for Llama Stack Tests by @ConnorHack in #523
specify the client version that works for current together server by @jeffxtang in #566
remove unused telemetry related code by @dineshyv in #570
Fix up safety client for versioned API by @stevegrubb in #573
Add eval/scoring/datasetio API providers to distribution templates & UI developer guide by @yanxi0830 in #564
Add ability to query and export spans to dataset by @dineshyv in #574
Renames otel config from jaeger to otel by @codefromthecrypt in #569
add telemetry docs by @dineshyv in #572
Console span processor improvements by @dineshyv in #577
doc: quickstart guide errors by @aidando73 in #575
Add kotlin docs by @Riandy in #568
Update android_sdk.md by @Riandy in #578
Bump kotlin docs to 0.0.54.1 by @Riandy in #579
Make LlamaStackLibraryClient work correctly by @ashwinb in #581
Update integration type for Cerebras to hosted by @henrytwo in #583
Use customtool's get_tool_definition to remove duplication by @jeffxtang in #584
[#391] Add support for json structured output for vLLM by @aidando73 in #528
Fix Jaeger instructions by @yurishkuro in #580
fix telemetry import by @yanxi0830 in #585
update template run.yaml to include openai api key for braintrust by @yanxi0830 in #590
add tracing to library client by @dineshyv in #591
Fixes for library client by @ashwinb in #587
Fix issue 586 by @yanxi0830 in #594

New Contributors

@sablair made their first contribution in #549
@JeffreyLind3 made their first contribution in #547
@aidando73 made their first contribution in #554
@henrytwo made their first contribution in #265
@sixianyi0721 made their first contribution in #507
@ConnorHack made their first contribution in #523
@yurishkuro made their first contribution in #580

Full Changelog: v0.0.55...v0.0.61

Contributors

ashwinb, codefromthecrypt, and 15 other contributors

Assets 2

23 Nov 17:14

ashwinb

v0.0.55

45fd732

v0.0.55 release

What's Changed

Fix TGI inference adapter
Fix llama stack build in 0.0.54 by @dltn in #505
Several documentation related improvements
Fix opentelemetry adapter by @dineshyv in #510
Update Ollama supported llama model list by @hickeyma in #483

Full Changelog: v0.0.54...v0.0.55

Contributors

dltn, hickeyma, and dineshyv

Assets 2

22 Nov 00:36

yanxi0830

v0.0.54

2137b0a

Llama Stack 0.0.54 Release

What's Changed

Bugfixes release on top of 0.0.53
Don't depend on templates.py when print llama stack build messages by @ashwinb in #496
Restructure docs by @dineshyv in #494
Since we are pushing for HF repos, we should accept them in inference configs by @ashwinb in #497
Fix fp8 quantization script. by @liyunlu0618 in #500
use logging instead of prints by @dineshyv in #499

New Contributors

@liyunlu0618 made their first contribution in #500

Full Changelog: v0.0.53...v0.0.54

Contributors

ashwinb, liyunlu0618, and dineshyv

Assets 2

20 Nov 22:18

yanxi0830

v0.0.53

00816cc

Llama Stack 0.0.53 Release

🚀 Initial Release Notes for Llama Stack!

Added

Resource-oriented design for models, shields, memory banks, datasets and eval tasks
Persistence for registered objects with distribution
Ability to persist memory banks created for FAISS
PostgreSQL KVStore implementation
Environment variable placeholder support in run.yaml files
Comprehensive Zero-to-Hero notebooks and quickstart guides
Support for quantized models in Ollama
Vision models support for Together, Fireworks, Meta-Reference, and Ollama, and vLLM
Bedrock distribution with safety shields support
Evals API with task registration and scoring functions
MMLU and SimpleQA benchmark scoring functions
Huggingface dataset provider integration for benchmarks
Support for custom dataset registration from local paths
Benchmark evaluation CLI tools with visualization tables
RAG evaluation scoring functions and metrics
Local persistence for datasets and eval tasks

Changed

Split safety into distinct providers (llama-guard, prompt-guard, code-scanner)
Changed provider naming convention (impls → inline, adapters → remote)
Updated API signatures for dataset and eval task registration
Restructured folder organization for providers
Enhanced Docker build configuration
Added version prefixing for REST API routes
Enhanced evaluation task registration workflow
Improved benchmark evaluation output formatting
Restructured evals folder organization for better modularity

Removed

llama stack configure command

What's Changed

Update download command by @Wauplin in #9
Update fbgemm version by @jianyuh in #12
Add CLI reference docs by @dltn in #14
Added Ollama as an inference impl by @hardikjshah in #20
Hide older models by @dltn in #23
Introduce Llama stack distributions by @ashwinb in #22
Rename inline -> local by @dltn in #24
Avoid using nearly double the memory needed by @ashwinb in #30
Updates to prompt for tool calls by @hardikjshah in #29
RFC-0001-The-Llama-Stack by @raghotham in #8
Add API keys to AgenticSystemConfig instead of relying on dotenv by @ashwinb in #33
update cli ref doc by @jeffxtang in #34
fixed bug in download not enough disk space condition by @sisminnmaw in #35
Updated cli instructions with additonal details for each subcommands by @varunfb in #36
Updated URLs and addressed feedback by @varunfb in #37
Fireworks basic integration by @benjibc in #39
Together AI basic integration by @Nutlope in #43
Update LICENSE by @raghotham in #47
Add patch for SSE event endpoint responses by @dltn in #50
API Updates: fleshing out RAG APIs, introduce "llama stack" CLI command by @ashwinb in #51
[inference] Add a TGI adapter by @ashwinb in #52
upgrade llama_models by @benjibc in #55
Query generators for RAG query by @hardikjshah in #54
Add Chroma and PGVector adapters by @ashwinb in #56
API spec update, client demo with Stainless SDK by @yanxi0830 in #58
Enable Bing search by @hardikjshah in #59
add safety to openapi spec by @yanxi0830 in #62
Add config file based CLI by @yanxi0830 in #60
Simplified Telemetry API and tying it to logger by @ashwinb in #57
[Inference] Use huggingface_hub inference client for TGI adapter by @hanouticelina in #53
Support data: in URL for memory. Add ootb support for pdfs by @hardikjshah in #67
Remove request wrapper migration by @yanxi0830 in #64
CLI Update: build -> configure -> run by @yanxi0830 in #69
API Updates by @ashwinb in #73
Unwrap ChatCompletionRequest for context_retriever by @yanxi0830 in #75
CLI - add back build wizard, configure with name instead of build.yaml by @yanxi0830 in #74
CLI: add build templates support, move imports by @yanxi0830 in #77
fix prompt with name args by @yanxi0830 in #80
Fix memory URL parsing by @yanxi0830 in #81
Allow TGI adaptor to have non-standard llama model names by @hardikjshah in #84
[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers by @ashwinb in #92
Bedrock Guardrails comiting after rebasing the fork by @rsgrewal-aws in #96
Bedrock Inference Integration by @poegej in #94
Support for Llama3.2 models and Swift SDK by @ashwinb in #98
fix safety using inference by @yanxi0830 in #99
Fixes typo for setup instruction for starting Llama Stack Server section by @abhishekmishragithub in #103
Make TGI adapter compatible with HF Inference API by @Wauplin in #97
Fix links & format by @machina-source in #104
docs: fix typo by @dijonkitchen in #107
LG safety fix by @kplawiak in #108
Minor typos, HuggingFace -> Hugging Face by @marklysze in #113
Reordered pip install and llama model download by @KarthiDreamr in #112
Update getting_started.ipynb by @delvingdeep in #117
fix: 404 link to agentic system repository by @moldhouse in #118
Fix broken links in RFC-0001-llama-stack.md by @bhimrazy in #134
Validate name in llama stack build by @russellb in #128
inference: Fix download command in error msg by @russellb in #133
configure: Fix a error msg typo by @russellb in #131
docs: Note how to use podman by @russellb in #130
add env for LLAMA_STACK_CONFIG_DIR by @yanxi0830 in #137
[bugfix] fix duplicate api endpoints by @yanxi0830 in #139
Use inference APIs for executing Llama Guard by @ashwinb in #121
fixing safety inference and safety adapter for new API spec. Pinned t… by @yogishbaliga in #105
[CLI] remove dependency on CONDA_PREFIX in CLI by @yanxi0830 in #144
[bugfix] fix #146 by @yanxi0830 in #147
Extract provider data properly (attempt 2) by @ashwinb in #148
is_multimodal accepts core_model_id not model itself. by @wizardbc in #153
fix broken bedrock inference provider by @moritalous in #151
Fix podman+selinux compatibility by @russellb in #132
docker: Install in editable mode for dev purposes by @russellb in #160
[CLI] simplify docker run by @yanxi0830 in #159
Add a RoutableProvider protocol, support for multiple routing keys by @ashwinb in #163
docker: Check for selinux before using --security-opt by @russellb in #167
Adds markdown-link-check and fixes a broken link by @codefromthecrypt in #165
[bugfix] conda path lookup by @yanxi0830 in #179
fix prompt guard by @ashwinb in #177
inference: Add model option to client by @russellb in #17...