KnowledgeBasePDFAssistant

The provided code is a FastAPI application that serves as a knowledge base using PDF documents.

Dependencies

The project relies on the following libraries and tools:

chromadb: A database client for storing and querying vector embeddings.
fitz (PyMuPDF): To extract text from PDF files.
uvicorn: An ASGI server to run the FastAPI application.
fastapi: A modern web framework for building APIs.
langchain.text_splitter: To split large texts into smaller chunks.
openai: To interact with OpenAI's API, specifically Azure's version.
pydantic: For data validation and settings management through Python dataclasses.
sentence_transformers: To generate embeddings for text chunks.

Running the Application

Use the following command to run the FastAPI application:

uvicorn main:app --host 0.0.0.0 --port 8000

Perform testing

import requests

url = "http://0.0.0.0:8000/query"

data = {"query": "What is Data engineering ?"}

response = requests.post(url, json=data, headers = {"Content-Type": "application/json"})

response.json().get("response")

Flow Diagram

+-------------------+
| Start             |
+-------------------+
         |
         v
+-------------------+
| Import Libraries  |
+-------------------+
         |
         v
+-------------------+
| Initialize App    |
| & Global Variables|
+-------------------+
         |
         v
+-------------------+
| Define Functions: |
| - extract_text    |
| - preprocess_text |
| - index_pdf_text  |
| - query_knowledge |
+-------------------+
         |
         v
+-------------------+
| Define Pydantic   |
| Model: QueryRequest|
+-------------------+
         |
         v
+-------------------+
| Define API        |
| Endpoint: /query  |
+-------------------+
         |
         v
+-------------------+
| Main Block:       |
| - Index PDFs      |
| - Run Server      |
+-------------------+
         |
         v
+-------------------+
| End               |
+-------------------+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

KnowledgeBasePDFAssistant

Dependencies

Running the Application

Perform testing

Flow Diagram

Files

README.md

Latest commit

History

README.md

File metadata and controls

KnowledgeBasePDFAssistant

Dependencies

Running the Application

Perform testing

Flow Diagram