Update docs for RAG and improve CONTRIBUTING.md

meta-llama · Jan 28, 2025 · d123e9d · d123e9d
1 parent 229f0d5
commit d123e9d
Show file tree

Hide file tree

Showing 3 changed files with 110 additions and 48 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -2,15 +2,44 @@
 We want to make contributing to this project as easy and transparent as
 possible.
 
-## Pull Requests
-We actively welcome your pull requests.
+## Discussions -> Issues -> Pull Requests
+
+We actively welcome your pull requests. However, please read the following. This is heavily inspired by [Ghostty](https://github.com/ghostty-org/ghostty/blob/main/CONTRIBUTING.md).
+
+If in doubt, please open a [discussion](https://github.com/meta-llama/llama-stack/discussions); we can always convert that to an issue later.
+
+**I'd like to contribute!**
+
+All issues are actionable (please report if they are not.) Pick one and start working on it. Thank you.
+If you need help or guidance, comment on the issue. Issues that are extra friendly to new contributors are tagged with "contributor friendly".
+
+**I have a bug!**
+
+1. Search the issue tracker and discussions for similar issues.
+2. If you don't have steps to reproduce, open a discussion.
+3. If you have steps to reproduce, open an issue.
+
+**I have an idea for a feature!**
+
+1. Open a discussion.
+
+**I've implemented a feature!**
+
+1. If there is an issue for the feature, open a pull request.
+2. If there is no issue, open a discussion and link to your branch.
+
+**I have a question!**
+
+1. Open a discussion or use [Discord](https://discord.gg/llama-stack).
+
+
+**Opening a Pull Request**
 
 1. Fork the repo and create your branch from `main`.
-2. If you've added code that should be tested, add tests.
-3. If you've changed APIs, update the documentation.
-4. Ensure the test suite passes.
-5. Make sure your code lints.
-6. If you haven't already, complete the Contributor License Agreement ("CLA").
+2. If you've changed APIs, update the documentation.
+3. Ensure the test suite passes.
+4. Make sure your code lints using `pre-commit`.
+5. If you haven't already, complete the Contributor License Agreement ("CLA").
 
 ## Contributor License Agreement ("CLA")
 In order to accept your pull request, we need you to submit a CLA. You only need

diff --git a/docs/source/building_applications/rag.md b/docs/source/building_applications/rag.md
@@ -1,71 +1,99 @@
-## Memory & RAG
+## Using "Memory" or Retrieval Augmented Generation (RAG)
 
-Memory enables your applications to reference and recall information from previous interactions or external documents. Llama Stack's memory system is built around the concept of Memory Banks:
+Memory enables your applications to reference and recall information from previous interactions or external documents.
 
-1. **Vector Memory Banks**: For semantic search and retrieval
-2. **Key-Value Memory Banks**: For structured data storage
-3. **Keyword Memory Banks**: For basic text search
-4. **Graph Memory Banks**: For relationship-based retrieval
+Llama Stack organizes the memory APIs into three layers:
+- the lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.)
+- next is the "Rag Tool", a first-class tool as part of the Tools API that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly.
+- finally, it all comes together with the top-level "Agents" API that allows you to create agents that can use the tools to answer questions, perform tasks, and more.
 
-Here's how to set up a vector memory bank for RAG:
+<img src="rag.png" alt="RAG System" width="50%">
+
+The RAG system uses lower-level storage for different types of data:
+* **Vector IO**: For semantic search and retrieval
+* **Key-Value and Relational IO**: For structured data storage
+
+We may add more storage types like Graph IO in the future.
+
+### Setting up Vector DBs
+
+Here's how to set up a vector database for RAG:
 
 ```python
-# Register a memory bank
-bank_id = "my_documents"
-response = client.memory_banks.register(
-    memory_bank_id=bank_id,
-    params={
-        "memory_bank_type": "vector",
-        "embedding_model": "all-MiniLM-L6-v2",
-        "chunk_size_in_tokens": 512
-    }
+# Register a vector db
+vector_db_id = "my_documents"
+response = client.vector_dbs.register(
+    vector_db_id=vector_db_id,
+    embedding_model="all-MiniLM-L6-v2",
+    embedding_dimension=384,
+    provider_id="faiss",
 )
 
-# Insert documents
-documents = [
+# You can insert a pre-chunked document directly into the vector db
+chunks = [
     {
         "document_id": "doc1",
         "content": "Your document text here",
         "mime_type": "text/plain"
-    }
+    },
+    ...
 ]
-client.memory.insert(bank_id, documents)
+client.vector_io.insert(vector_db_id, chunks)
+
+# You can then query for these chunks
+chunks_response = client.vector_io.query(vector_db_id, query="What do you know about...")
+
+```
+
+### Using the RAG Tool
+
+A better way to ingest documents is to use the RAG Tool. This tool allows you to ingest documents from URLs, files, etc. and automatically chunks them into smaller pieces.
+
+```python
+from llama_stack_client.types import Document
+
+urls = ["memory_optimizations.rst", "chat.rst", "llama3.rst"]
+documents = [
+    Document(
+        document_id=f"num-{i}",
+        content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}",
+        mime_type="text/plain",
+        metadata={},
+    )
+    for i, url in enumerate(urls)
+]
+
+client.tool_runtime.rag_tool.insert(
+    documents=documents,
+    vector_db_id=vector_db_id,
+    chunk_size_in_tokens=512,
+)
 
 # Query documents
-results = client.memory.query(
-    bank_id=bank_id,
+results = client.tool_runtime.rag_tool.query(
+    vector_db_id=vector_db_id,
     query="What do you know about...",
 )
 ```
 
-
 ### Building RAG-Enhanced Agents
 
 One of the most powerful patterns is combining agents with RAG capabilities. Here's a complete example:
 
 ```python
-from llama_stack_client.types import Attachment
-
-# Create attachments from documents
-attachments = [
-    Attachment(
-        content="https://raw.githubusercontent.com/example/doc.rst",
-        mime_type="text/plain"
-    )
-]
 
 # Configure agent with memory
 agent_config = AgentConfig(
     model="Llama3.2-3B-Instruct",
     instructions="You are a helpful assistant",
-    tools=[{
-        "type": "memory",
-        "memory_bank_configs": [],
-        "query_generator_config": {"type": "default", "sep": " "},
-        "max_tokens_in_context": 4096,
-        "max_chunks": 10
-    }],
-    enable_session_persistence=True
+    toolgroups=[
+        {
+            "name": "builtin::rag",
+            "args": {
+                "vector_db_ids": [vector_db_id],
+            }
+        }
+    ]
 )
 
 agent = Agent(client, agent_config)
@@ -77,7 +105,12 @@ response = agent.create_turn(
         "role": "user",
         "content": "I am providing some documents for reference."
     }],
-    attachments=attachments,
+    documents=[
+        dict(
+            content="https://raw.githubusercontent.com/example/doc.rst",
+            mime_type="text/plain"
+        )
+    ],
     session_id=session_id
 )
 

diff --git a/docs/source/building_applications/rag.png b/docs/source/building_applications/rag.png