Skip to content

v0.1.0

Latest
Compare
Choose a tag to compare
@ashwinb ashwinb released this 24 Jan 17:47
· 25 commits to main since this release

We are excited to announce a stable API release of Llama Stack, which enables developers to build RAG applications and Agents using tools and safety shields, monitor and those agents with telemetry, and evaluate the agent with scoring functions.

Context

GenAI application developers need more than just an LLM - they need to integrate tools, connect with their data sources, establish guardrails, and ground the LLM responses effectively. Currently, developers must piece together various tools and APIs, complicating the development lifecycle and increasing costs. The result is that developers are spending more time on these integrations rather than focusing on the application logic itself. The bespoke coupling of components also makes it challenging to adopt state-of-the-art solutions in the rapidly evolving GenAI space. This is particularly difficult for open models like Llama, as best practices are not widely established in the open.

Llama Stack was created to provide developers with a comprehensive and coherent interface that simplifies AI application development and codifies best practices across the Llama ecosystem. Since our launch in September 2024, we have seen a huge uptick in interest in Llama Stack APIs by both AI developers and from partners building AI services with Llama models. Partners like Nvidia, Fireworks, and Ollama have collaborated with us to develop implementations across various APIs, including inference, memory, and safety.

With Llama Stack, you can easily build a RAG agent which can also search the web, do complex math, and custom tool calling. You can use telemetry to inspect those traces, and convert telemetry into evals datasets. And with Llama Stack’s plugin architecture and prepackage distributions, you choose to run your agent anywhere - in the cloud with our partners, deploy your own environment using virtualenv, conda, or Docker, operate locally with Ollama, or even run on mobile devices with our SDKs. Llama Stack offers unprecedented flexibility while also simplifying the developer experience.

Release

After iterating on the APIs for the last 3 months, today we’re launching a stable release (V1) of the Llama Stack APIs and the corresponding llama-stack server and client packages(v0.1.0). We now have automated tests for providers. These tests make sure that all provider implementations are verified. Developers can now easily and reliably select distributions or providers based on their specific requirements.

There are example standalone apps in llama-stack-apps.

Key Features of this release

  • Unified API Layer

    • Inference: Run LLM models
    • RAG: Store and retrieve knowledge for RAG
    • Agents: Build multi-step agentic workflows
    • Tools: Register tools that can be called by the agent
    • Safety: Apply content filtering and safety policies
    • Evaluation: Test model and agent quality
    • Telemetry: Collect and analyze usage data and complex agentic traces
    • Post Training ( Coming Soon ): Fine tune models for specific use cases
  • Rich Provider Ecosystem

    • Local Development: Meta's Reference, Ollama
    • Cloud: Fireworks, Together, Nvidia, AWS Bedrock, Groq, Cerebras
    • On-premises: Nvidia NIM, vLLM, TGI, Dell-TGI
    • On-device: iOS and Android support
  • Built for Production

    • Pre-packaged distributions for common deployment scenarios
    • Backwards compatibility across model versions
    • Comprehensive evaluation capabilities
    • Full observability and monitoring
  • Multiple developer interfaces

    • CLI: Command line interface
    • Python SDK
    • Swift iOS SDK
    • Kotlin Android SDK
  • Sample llama stack applications

    • Python
    • iOS
    • Android

What's Changed

New Contributors

Full Changelog: v0.0.63...v0.1.0