With more widespread use of Large Language Models (LLMs), the risks of hallucination becomes significant. One approach to minimise hallucination is using Retrieval-Augmented Generation technique, where the LLM knowledge is bounded in a given context. Context comes from ingesting material (websites, documents, text, ...), creating embeddings and storing them in a vector database.
Backend: Python
Frameworks: LangChain, FastAPI
Vectorstore: Pinecone
- Clone the repository and navigate to my-app directory
cd my-app/
- Set up your python virtual environment and install dependencies
pip install -r requirements.txt
-
Set up your Pinecone account, follow their documentation to set up a project and an index
-
Set up your OpenAI API account
-
(Optional) Set up your LangSmith account if you'd like to trace LangChain's output at every step.
-
Set up your environment variables
# Compulsory
PINECONE_API_KEY=
PINECONE_ENVIRONMENT=
PINECONE_INDEX=
OPENAI_API_KEY=
# Optional
LANGCHAIN_TRACING_V2=
LANGCHAIN_ENDPOINT=
LANGCHAIN_API_KEY=
LANGCHAIN_PROJECT=
- Run the application at port 8080 (change port if needed)
uvicorn app.server:app --host 0.0.0.0 --port 8080
-
Navigate to
/docs
for documentation of API endpoints offered. -
Main end-points you will interact with are:
POST /rag-pinecone/website
: Ingest data from a given website. Takes a query parameterurl
. Example:
http://localhost:8080/rag-pinecone/website?url=https://lilianweng.github.io/posts/2023-06-23-agent
POST /rag-pinecone/invoke
: Question and answer with agent. Takes in a JSON body.
{
"input": "What is task decomposition?"
}
POST /rag-pinecone/stream
: Question and answer with agent, but answers are streamed. Takes in a JSON body
{
"input": "What is task decomposition?"
}
- Feel free to modify
packages/rag-pinecone/rag_pinecone/chain.py
to experiment with LangChain. With every modification, run the following to update the module:
pip install packages/rag-pinecone
Email: [email protected]
LinkedIn: https://www.linkedin.com/in/leminhgiang/