Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements on README.md and requirements.txt #14

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
*.pyc
.DS_Store
backup
chroma
chroma
venv
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,14 @@
# rag-tutorial-v2

## Getting Started
Before running, ensure that ollama has been installed. If not, please install it [here](https://ollama.com/download). After that you **must** run `ollama serve`.

1. `python -m venv venv`
2. `source venv/bin/activate`
3. `pip install -r requirements.txt`
4. `python ./populate_database.py`
5. `python ./query_data.py "How much each players get for each round in Monopoly?"`

## Update document
1. Place the pdf file inside the `data`
2. `python ./populate_database.py`
8 changes: 4 additions & 4 deletions get_embedding_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@


def get_embedding_function():
embeddings = BedrockEmbeddings(
credentials_profile_name="default", region_name="us-east-1"
)
# embeddings = OllamaEmbeddings(model="nomic-embed-text")
# embeddings = BedrockEmbeddings(
# credentials_profile_name="default", region_name="us-east-1"
# )
embeddings = OllamaEmbeddings(model="mxbai-embed-large")
return embeddings
5 changes: 2 additions & 3 deletions populate_database.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
import argparse
import os
import shutil
from langchain.document_loaders.pdf import PyPDFDirectoryLoader
from langchain_community.document_loaders.pdf import PyPDFDirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.schema.document import Document
from get_embedding_function import get_embedding_function
from langchain.vectorstores.chroma import Chroma
from langchain_chroma import Chroma


CHROMA_PATH = "chroma"
Expand Down Expand Up @@ -67,7 +67,6 @@ def add_to_chroma(chunks: list[Document]):
print(f"👉 Adding new documents: {len(new_chunks)}")
new_chunk_ids = [chunk.metadata["id"] for chunk in new_chunks]
db.add_documents(new_chunks, ids=new_chunk_ids)
db.persist()
else:
print("✅ No new documents to add")

Expand Down
2 changes: 1 addition & 1 deletion query_data.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import argparse
from langchain.vectorstores.chroma import Chroma
from langchain_chroma import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain_community.llms.ollama import Ollama

Expand Down
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
pypdf
langchain
chromadb # Vector storage
langchain_community
langchain_chroma
pytest
boto3