This application is designed to convert PDF documents into a knowledge graph stored in Neo4j. It utilizes the power of OpenAI's GPT/Diffbot LLM(Large language model) to extract nodes, relationships and properties from the text content of the PDF and then organizes them into a structured knowledge graph using Langchain framework. Files can be uploaded from local machine or S3 bucket and then LLM model can be chosen to create the knowledge graph.
GEMINI_ENABLED = False
GCP_LOG_METRICS_ENABLED = False
And for the frontend, make sure to export your local backend URL before running docker-compose by having the BACKEND_API_URL set in your ENV file :
BACKEND_API_URL="http://localhost:8000"
-
Run Docker Compose to build and start all components:
docker-compose up --build
-
Alternatively, you can run specific directories separately:
-
For the frontend:
cd frontend yarn yarn run dev
-
For the backend:
cd backend python -m venv envName source envName/bin/activate pip install -r requirements.txt uvicorn score:app --reload
-
To deploy the app and packages on Google Cloud Platform, run the following command on google cloud run:
# Frontend deploy
gcloud run deploy
source location current directory > Frontend
region : 32 [us-central 1]
Allow unauthenticated request : Yes
# Backend deploy
gcloud run deploy --set-env-vars "OPENAI_API_KEY = " --set-env-vars "DIFFBOT_API_KEY = " --set-env-vars "NEO4J_URI = " --set-env-vars "NEO4J_PASSWORD = " --set-env-vars "NEO4J_USERNAME = "
source location current directory > Backend
region : 32 [us-central 1]
Allow unauthenticated request : Yes
- PDF Upload: Users can upload PDF documents using the Drop Zone.
- S3 Bucket Integration: Users can also specify PDF documents stored in an S3 bucket for processing.
- Knowledge Graph Generation: The application employs OpenAI/Diffbot's LLM to extract relevant information from the PDFs and construct a knowledge graph.
- Neo4j Integration: The extracted nodes and relationships are stored in a Neo4j database for easy visualization and querying.
- Grid View of source node files with : Name,Type,Size,Nodes,Relations,Duration,Status,Source,Model
Create .env file and update the following env variables.
OPENAI_API_KEY = ""
DIFFBOT_API_KEY = ""
NEO4J_URI = ""
NEO4J_USERNAME = ""
NEO4J_PASSWORD = ""
AWS_ACCESS_KEY_ID = ""
AWS_SECRET_ACCESS_KEY = ""
EMBEDDING_MODEL = ""
IS_EMBEDDING = "TRUE"
KNN_MIN_SCORE = ""
Create .env file in the frontend root folder and update the following env variables.
BACKEND_API_URL=""
BLOOM_URL=""
REACT_APP_SOURCES=""
LLM_MODELS=""
ENV=""
TIME_PER_CHUNK=
Extracts nodes , relationships and properties from a PDF file leveraging LLM models.
Args:
uri: URI of the graph to extract
userName: Username to use for graph creation ( if None will use username from config file )
password: Password to use for graph creation ( if None will use password from config file )
file: File object containing the PDF file path to be used
model: Type of model to use ('Gemini Pro' or 'Diffbot')
Returns:
Json response to API with fileName, nodeCount, relationshipCount, processingTime,
status and model as attributes.
![neoooo](https://private-user-images.githubusercontent.com/118245454/309273153-01e731df-b565-4f4f-b577-c47e39dd1748.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNjQxNjgsIm5iZiI6MTczOTI2Mzg2OCwicGF0aCI6Ii8xMTgyNDU0NTQvMzA5MjczMTUzLTAxZTczMWRmLWI1NjUtNGY0Zi1iNTc3LWM0N2UzOWRkMTc0OC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMVQwODUxMDhaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01MjYwMjZlMDE4MjMwNGE3Y2M4MWY1MGM1YjlmMmNmNTM2YTdhNGQzMWFjOWU5MTliMzExMGEzNjI4NmI2ZTZkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.i1xujKWLpisKVv-5tguLORR3u1RiVNFtVIHvFL9xn5c)
Creates a source node in Neo4jGraph and sets properties.
Args:
uri: URI of Graph Service to connect to
userName: Username to connect to Graph Service with ( default : None )
password: Password to connect to Graph Service with ( default : None )
file: File object with information about file to be added
Returns:
Success or Failure message of node creation
![neo_workspace](https://private-user-images.githubusercontent.com/118245454/309272338-f2eb11cd-718c-453e-bec9-11410ec6e45d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNjQxNjgsIm5iZiI6MTczOTI2Mzg2OCwicGF0aCI6Ii8xMTgyNDU0NTQvMzA5MjcyMzM4LWYyZWIxMWNkLTcxOGMtNDUzZS1iZWM5LTExNDEwZWM2ZTQ1ZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMVQwODUxMDhaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hNmZlM2VmMWJmYzk2NDE0NmQ0OTZjOWEzNzYxZGY3MGZlMzAzYjVlNmYxN2JiNDYzN2I2YTA1YmZiOTI4MWI1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.AGQF7Yp7jpw7_YBf4tpjf0utYIvfUrtHFbvSXbC8CTg)
Returns a list of file sources in the database by querying the graph and
sorting the list by the last updated date.
![get_source](https://private-user-images.githubusercontent.com/118245454/309273465-1d8c7a86-6f10-4916-a4c1-8fdd9f312bcc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNjQxNjgsIm5iZiI6MTczOTI2Mzg2OCwicGF0aCI6Ii8xMTgyNDU0NTQvMzA5MjczNDY1LTFkOGM3YTg2LTZmMTAtNDkxNi1hNGMxLThmZGQ5ZjMxMmJjYy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMVQwODUxMDhaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zZjgxNWE3ZjI0ZjU5NDYxMTRmZTVmNzgzMGEwYzFjYTNkYTA1ZDhhYmQ3YjhkYTFjZWNmN2RlYmNlNWQyNGY2JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.uaUkvZ6l28Dus3NpaEh-iPea4zPjqrr49ZdN4oE49So)
![chunking](https://private-user-images.githubusercontent.com/118245454/309275406-4d61479c-e5e9-415e-954e-3edf6a773e72.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyNjQxNjgsIm5iZiI6MTczOTI2Mzg2OCwicGF0aCI6Ii8xMTgyNDU0NTQvMzA5Mjc1NDA2LTRkNjE0NzljLWU1ZTktNDE1ZS05NTRlLTNlZGY2YTc3M2U3Mi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxMVQwODUxMDhaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03ZTAzOTA4YzM5NDdhYzAxZjk4OTNhYWYzMzQwYTA4ZTI5ZGM3ZWI4MWIzYjg0Y2JiMzM1MGFhNmQyNDUwYzc1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.GO-4fKkeLHJz09_mjGg_SiCnjcjf8Y4_R6tJa2DwRkU)
KGB.mp4
The Public Google cloud Run URL. Workspace URL