-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
eduardo
committed
Mar 25, 2024
1 parent
79a4aa1
commit 33781fc
Showing
10 changed files
with
730 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
GOOGLE_API_KEY=123456789 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,3 @@ | ||
.env | ||
.env | ||
venv | ||
.idea |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,72 @@ | ||
# rag-url-reader | ||
# RAG URL Reader Application | ||
This RAG URL Reader is an application designed to load data from URLs and generate answers based on the provided | ||
context using Large Language Models (LLMs). This application is intended to offer users contextually appropriate | ||
responses to their inputs, aiding in information retrieval and facilitating smoother interaction within the | ||
given context. It serves as a complement to the concepts discussed in the accompanying | ||
[YouTube video](https://www.youtube.com/channel/UCYZ_si4TG801SAuLrNl-v-g), offering a practical implementation | ||
of the discussed techniques. | ||
|
||
 | ||
|
||
|
||
# Technology Used | ||
- Python: Programming language used for development. | ||
- Streamlit: Frontend framework for creating interactive web applications. | ||
- Langchain: Utilized for language model integration. | ||
- Google PaLM Embeddings: Transform texts to embeddings. | ||
- Google PaLM and Gemini Pro: Large Language Models used for generating context-based responses. | ||
- FAISS: Vector database for storing and querying embeddings efficiently. | ||
|
||
|
||
# Features | ||
- Load data from URLs. | ||
- Generate embeddings and store them in a vector database such as FAISS. | ||
- Utilize Large Language Models from Google to generate answers based on the context. | ||
|
||
# Installation | ||
To install the RAG URL Reader application, follow these steps: | ||
|
||
Clone the repository: | ||
|
||
git clone https://github.com/Eduardovasquezn/rag-url-reader.git | ||
|
||
Navigate to the project directory: | ||
|
||
cd rag-url-reader | ||
|
||
Create and activate virtual environment: | ||
|
||
python -m venv venv | ||
venv/Scripts/activate | ||
|
||
Install the required Python libraries: | ||
|
||
pip install -r requirements.txt | ||
|
||
# Usage | ||
|
||
Create a `.env` file using `.env-example` as a template: | ||
|
||
cp .env-example .env | ||
|
||
In the `.env` file, insert your [Google API Key](https://aistudio.google.com/app/apikey): | ||
|
||
GOOGLE_API_KEY=your_google_api_key | ||
|
||
Run the main application script: | ||
|
||
streamlit run src/app.py | ||
|
||
|
||
# Contribution | ||
Contributions to this project are welcome! Feel free to submit bug reports, feature requests, or pull | ||
requests to help improve the functionality and usability of the RAG URL Reader application. | ||
|
||
# Disclaimer | ||
This application is intended for educational and informational purposes only. | ||
|
||
# Enjoyed Using RAG URL Reader? Subscribe to My Channel! | ||
If you found the RAG URL Reader helpful and enjoyed using it, consider subscribing to my | ||
[YouTube channel](https://www.youtube.com/channel/UCYZ_si4TG801SAuLrNl-v-g?sub_confirmation=1) for more tutorials, | ||
tips, and projects related to Python, AI, and web development. Your support helps me create more valuable content | ||
for the community! |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
streamlit==1.31.1 | ||
langchain==0.1.9 | ||
python-dotenv==0.21.0 | ||
google-cloud-aiplatform==1.43.0 | ||
google-generativeai==0.4.0 | ||
langchain-google-vertexai==0.1.0 | ||
unstructured==0.12.6 | ||
faiss-cpu==1.8.0 |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
|
||
import streamlit as st | ||
from langchain_community.vectorstores.faiss import FAISS | ||
|
||
from common.utils import data_ingestion, get_embeddings, build_vector_store_database, get_llm, get_response_llm | ||
|
||
|
||
def main(): | ||
# Configure the settings of the webpage | ||
st.set_page_config(page_title="Chat with Websites", page_icon= "🧊", layout="wide") | ||
|
||
# Add a header | ||
st.header("Chat with websites using Google Palm and Gemini-Pro💬🤖") | ||
|
||
# Input question from the user | ||
user_question = st.text_input("Ask a question from the URLs") | ||
|
||
# Vertex AI - Google Palm text embeddings | ||
embeddings = get_embeddings() | ||
|
||
# Create a sidebar | ||
with st.sidebar: | ||
# Title of the sidebar | ||
st.title("News Articles URL:") | ||
|
||
# List of urls | ||
urls = [] | ||
|
||
for i in range(5): | ||
url = st.sidebar.text_input(f"URL {i+1}") | ||
# Append URLs if provided | ||
if url: | ||
urls.append(url) | ||
|
||
if st.button("Process URLs"): | ||
with st.spinner("Data Ingestion...Started...✅✅✅"): | ||
# Ingest data | ||
docs = data_ingestion(urls = urls) | ||
# Create vector store database | ||
build_vector_store_database(documents = docs, embeddings = embeddings) | ||
st.success("Done") | ||
|
||
if st.button("Google Palm Output"): | ||
with st.spinner("Thinking"): | ||
# Load data | ||
faiss_index = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True) | ||
# Load LLM | ||
llm = get_llm(model_name = "text-bison@002") | ||
|
||
st.write(get_response_llm(llm, faiss_index, user_question)) | ||
|
||
st.success("Done") | ||
elif st.button("Gemini-Pro Output"): | ||
with st.spinner("Thinking"): | ||
# Load data | ||
faiss_index = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True) | ||
# Load LLM | ||
llm = get_llm(model_name="gemini-pro") | ||
|
||
st.write(get_response_llm(llm, faiss_index, user_question)) | ||
|
||
st.success("Done") | ||
|
||
if __name__ == "__main__": | ||
main() |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
from langchain.chains import RetrievalQA | ||
from langchain.text_splitter import RecursiveCharacterTextSplitter | ||
from langchain_community.document_loaders import WebBaseLoader | ||
from langchain_core.prompts import PromptTemplate | ||
from langchain_google_vertexai import VertexAIEmbeddings, VertexAI | ||
from dotenv import load_dotenv | ||
import os | ||
import google.generativeai as genai | ||
from langchain_community.vectorstores import FAISS | ||
|
||
google_api_key = os.getenv("GOOGLE_API_KEY") | ||
genai.configure(api_key=google_api_key) | ||
|
||
|
||
def data_ingestion(urls): | ||
|
||
loader = WebBaseLoader(urls) | ||
|
||
# Load html | ||
documents = loader.load() | ||
|
||
# Text splitter | ||
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, | ||
chunk_overlap=200) | ||
|
||
# Split data into chunks | ||
chunks = text_splitter.split_documents(documents) | ||
|
||
return chunks | ||
|
||
def get_embeddings(): | ||
# Model to get embeddings | ||
embeddings = VertexAIEmbeddings(model_name="textembedding-gecko@003") | ||
return embeddings | ||
|
||
def build_vector_store_database(documents, embeddings): | ||
vector_store = FAISS.from_documents(documents=documents, embedding=embeddings) | ||
# Location where the vector store database is saved | ||
vector_store.save_local("faiss_index") | ||
|
||
def get_llm(model_name): | ||
llm = VertexAI(model_name=model_name) | ||
return llm | ||
|
||
def get_response_llm(llm, vector_store, query): | ||
prompt_template = """ | ||
Human: Answer the question as detailed as possible from the provided context, make sure to provide all the details. | ||
Don't exceed 250 words on the explanation. If you don't know the answer, just say that you don't know, | ||
don't try to make up an answer.\n | ||
Context:\n {context}?\n | ||
Question: {question} | ||
Assistant: | ||
""" | ||
prompt = PromptTemplate(template=prompt_template, input_variables=['context', 'question']) | ||
|
||
qa = RetrievalQA.from_chain_type( | ||
llm=llm, | ||
retriever=vector_store.as_retriever( | ||
search_type="similarity", search_kwargs={"k": 3} | ||
), | ||
chain_type="stuff", | ||
return_source_documents=True, | ||
chain_type_kwargs={"prompt": prompt} | ||
|
||
) | ||
answer = qa({"query": query}) | ||
|
||
return answer['result'] |