Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Replicate API with Llama-2-13b-chat LLM and add document context bot using langchain. #131

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

sayamsamal
Copy link

@sayamsamal sayamsamal commented Sep 4, 2023

Add Replicate API with Llama-2-13b-chat LLM and Add document context bot using langchain and OpenAIEmbeddings and OpenAI ChatGpt.

This PR implements two major implementations:

1. Implementing a chatbot which takes any document as an input context which then specializes the bot in the regards to that document.

This on-the-fly training is crucial to a chatbot, since most of the times we want it to be specialized in an area. Think help-bots in commercial websites, support bots in e-commerce websites. These entities deal with a large amount of data and training a model on such large dataset would be exponentially costly. This feature of producing context on the go, will help them deal with model trainings in a macro level. Since this also supports multiple documents as input, we can thus have multiple sources as input thus increasing the scope of the chatbot with ease.

This model uses OpenAI ChatGPT - gpt-3.5-turbo and OpenAI Embeddings - text-embedding-ada-002. And it is inspired from the work at https://github.com/PromtEngineer/localGPT/.

This also adds the langchain library which provides us easy to understand and simple to use helper functions to ease our work.

Supported file formats: .txt, .md, .py, .pdf, .csv, .xls, .xlsx, .docx, .doc

How To:

  1. Create a folder textbase/utils/SOURCE_DOCUMENTS
  2. Upload your relevant files to the above created folder.
  3. Run the following command: python textbase/utils/ingest.py
    #TODO: Implement workflow to add document directly from the client side.
  4. Run the bot example at examples/document-bot/main.py
  5. Ask your queries and watch the bot give you contextually aware answers.

2. Implementing a Llama-2-13b-chat chatbot using Replicate, which provides us APIs to open-source LLMs.

Replicate provides us a free access for a limited period of time. And furthur allows us to access open-source LLMs on the cloud with a pay-as-you go model. While, I have implemented only the Llama-2-13b-chat LLM, replicate hosts many more models which can be explored over here.

Screenshots


Below, is my resume which was uploaded to the bot.

sayamsamal-resume

And here is the chatbot providing contextually accurate answers.

answers-1

answers-2

Code improvements

  • Added ingest.py utility which allows you to convert all the documents in a folder into embeddings and furthur into a chromadb vector store.
  • Implemented a clean UI which can be seen in the video submission. (not uploaded since this repo doesn't accept frontend submissions)

Developer checklist

  • I’ve manually tested that code works locally on desktop and mobile browsers.
  • I’ve reviewed my code.
  • I’ve removed all my personal credentials (API keys etc.) from the code.

This commit adds a new chatbot which can read any file as input and
then use the content of the file as context to answer further queries.

This commit is inspired https://github.com/PromtEngineer/localGPT/
@vercel
Copy link

vercel bot commented Sep 4, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
textbase ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 4, 2023 6:03am

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant