- Clone this repo!
- Install dependencies:
npm install
- Run the development server:
npm run dev
- Create .env file with OPENAI_API_KEY='Your key here'
Open http://localhost:3000 with your browser to see the result.
This app is my experiment to try out LangchainJS and to create embedding for GPT/LLM for a specific knowledge area and application.
- Write NextJS code for uploading a pdf file
- Langchain.JS will use the RecursiveCharacterTextSplitter to process the file into chunks.
- The processed chunks and the pdf to be saved in local storage "/folder"
- User has option to save in Pinecone data store or HNSWLib local storage
- write NextJS code to query the Pinecone data store or HNSWLib store AND also OpenAI
- Here's an example Next.js code for uploading a PDF file:
import { useState } from "react";
function UploadPDF() { const [file, setFile] = useState(null);
const handleFileChange = (event) => { setFile(event.target.files[0]); };
const handleSubmit = async (event) => { event.preventDefault(); const formData = new FormData(); formData.append("pdf", file); const response = await fetch("/api/upload-pdf", { method: "POST", body: formData, }); const data = await response.json(); console.log(data); };
return ( <form onSubmit={handleSubmit}> <input type="file" accept=".pdf" onChange={handleFileChange} /> <button type="submit">Upload PDF</button> </form> ); }
export default UploadPDF;
- Here's an example code using Langchain.js to process the PDF file into chunks using RecursiveCharacterTextSplitter:
const Langchain = require("langchain"); const fs = require("fs");
const pdf = fs.readFileSync("path/to/pdf"); const text = Langchain.PDF.extractText(pdf); const splitter = new Langchain.RecursiveCharacterTextSplitter(); const chunks = splitter.split(text); console.log(chunks);
- To save the processed chunks and the PDF file in local storage, you can use the
fs
module in Node.js:
const fs = require("fs");
fs.writeFileSync("path/to/local/folder/pdf.pdf", pdf); fs.writeFileSync("path/to/local/folder/chunks.json", JSON.stringify(chunks));
- To save the data in Pinecone data store or HNSWLib local storage, you can use their respective APIs. Here's an example code for Pinecone:
const Pinecone = require("@openai/pinecone");
const pinecone = new Pinecone(); await pinecone.init(); const index = await pinecone.create_index("my-index"); await index.upsert(chunks);
And here's an example code for HNSWLib:
const hnswlib = require("hnswlib");
const index = new hnswlib.Index({ dim: chunks[0].length, efConstruction: 200, indexFileName: "path/to/local/folder/index.bin", storeFileName: "path/to/local/folder/index.store", }); index.addDataPointBatch(chunks); index.saveIndex("path/to/local/folder/index.bin"); index.saveIndex("path/to/local/folder/index.store");
- To query the Pinecone data store or HNSWLib local storage, you can use their respective APIs. Here's an example code for Pinecone:
const Pinecone = require("@openai/pinecone");
const pinecone = new Pinecone(); await pinecone.init(); const index = await pinecone.get_index("my-index"); const results = await index.query(chunks[0], { k: 10 }); console.log(results);
And here's an example code for HNSWLib:
const hnswlib = require("hnswlib");
const index = new hnswlib.Index({ dim: chunks[0].length, efConstruction: 200, indexFileName: "path/to/local/folder/index.bin", storeFileName: "path/to/local/folder/index.store", }); index.loadIndex("path/to/local/folder/index.bin"); index.loadIndex("path/to/local/folder/index.store"); const results = index.knnQuery(chunks[0], 10); console.log(results);
To query OpenAI, you can use their API. Here's an example code:
const openai = require("openai");
openai.apiKey = "YOUR_API_KEY";
const prompt = "What is the meaning of life?"; const completions = await openai.completions.create({ engine: "davinci", prompt, maxTokens: 10, n: 1, stop: "\n", }); console.log(completions.choices[0].text);
Flow
- Chat Mode: Normal chat with the bot. Able to select the bot type with initial prompt. This is the default mode. Free and easy.
- Web Loader Mode: Load your reference web page. There is a text field to enter a web url. When submitting a prompt with web url filled in, it means this web page will be referred to as an embedding -> get embedding from OpenAI -> immediately get a response from OpenAI -> Show a button whether to save this embedding in Pinecone. It does not save by default.
- Pdf Loader Mode: Same as Web Loader, except loading a pdf is more memory intensive and takes longer time to get the embeddings. Because it is costly, the pdf embedding is saved by default in Pinecone.
- Agent Mode: Pick and Choose the tools for the agent. Tools: [Pinecone Store, Calculator, Browser]