RAG Structured Output Generation ChatBot

🌟.❓ Problem Statement:

Large Language Models (LLMs): Utilize quantized versions of LLMs like llama2, open-llama, or falcon that can run on Google Colab.
Fine-tuning/Prompt Engineering:
- Weather Service:
  - Identify location from user queries (e.g., "weather in Chennai").
  - Respond with "Service: Weather, Location: Chennai".
  - If location is missing, prompt user and respond with "Location: Chennai".
  - Support stateful interaction with multiple query-response exchanges.
- News Service:
  - Extract location from queries about news (e.g., "news for India").
  - Respond with "Location: India" or "Location: USA" depending on the location found.
  - If location is missing, prompt user and respond with "Service: News, Location: India."
Generalization: Design the system to work with any service and identify the relevant service and location from user prompts.

Specifically:

Develop a generic response format "Service: , Location: ".
Identify weather-related prompts (e.g., "weather in Bangalore", "climate in Bangalore") and respond with "Service: Weather, Location: Bangalore".
Identify news-related prompts (e.g., "news for India") and respond with "Service: News, Location: India".
Allow for handling of missing location information with prompts.

Desired Outcome:

A system that can dynamically identify services and locations from user prompts and provide appropriate responses.
The system should be generalizable to work with any service and location.
The system should be lightweight and run efficiently on Google Colab.

🌟. 🔍 Model Used:

We are using GPTQ quantized version of openchat_3.5 model, by "TheBloke".
- GPTQ is a post-training quantization (PTQ) method for 4-bit quantization.
The original openchat_3.5 model itself is a fine-tuned Mistral Model.
- 🔥 The first 7B model Achieves Comparable Results with ChatGPT (March)! 🔥
- 🤖 Open-source model on MT-bench scoring 7.81, outperforming 70B models 🤖
The original openchat_3.5 model can runs on consumer GPU with 24GB RAM, but the quantized version consumes ~6GB VRAM GPU only.
This is a 4 bit quantized, 7 Billion Parameter model, with a sequence length of 4096.

🏷️ Model Benchmarks:

Model	# Params	Average	MT-Bench	AGIEval	BBH MC	TruthfulQA	MMLU	HumanEval	BBH CoT	GSM8K
OpenChat-3.5	7B	61.6	7.81	47.4	47.6	59.1	64.3	55.5	63.5	77.3
ChatGPT (March)*	?	61.5	7.94	47.1	47.6	57.7	67.3	48.1	70.1	74.9
Mistral	7B	-	6.84	38.0	39.0	-	60.1	30.5	-	52.2
Open-source SOTA**	13B-70B	61.4	7.71	41.7	49.7	62.3	63.7	73.2	41.4	82.3
			WizardLM 70B	Orca 13B	Orca 13B	Platypus2 70B	WizardLM 70B	WizardCoder 34B	Flan-T5 11B	MetaMath 70B

🌟. 💻 Requirements

System Requirements

~ 6GB of system RAM

~ 6GB of GPU VRAM

Important

📌 A GPU of 6GB VRAM is madatory for running the inference. Will works fine on google colab with T4 GPU enabled.

Fig.1 - Resource usage in google colab notebook environment with T4 GPU

Software Requirements
- python-version: 3.10
- CUDA-version: 11.8

⬇️ python packages

pip install accelerate==0.25.0

pip install auto-gptq --extra-index-url "https://huggingface.github.io/autogptq-index/whl/cu118/"

pip install bitsandbytes==0.41.3.post2

pip install einops==0.7.0

pip install langchain==0.0.349

pip install optimum==1.15.0

pip install tiktoken==0.5.2

pip install torch @ "https://download.pytorch.org/whl/cu118/torch-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=a81b554184492005543ddc32e96469f9369d778dedd195d73bda9bed407d6589"
pip install torchaudio @ "https://download.pytorch.org/whl/cu118/torchaudio-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=cdfd0a129406155eee595f408cafbb92589652da4090d1d2040f5453d4cae71f"
pip install torchvision @ "https://download.pytorch.org/whl/cu118/torchvision-0.16.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=033712f65d45afe806676c4129dfe601ad1321d9e092df62b15847c02d4061dc"

pip install transformers==4.35.2

🌟. 📖 `Explained`:

🛠️ Tools used:

transformers: "for importing and using LLM model from 🤗Huggingface."
auto-gptq: "for working with quantized models."
langchain: 🦜🔗 "for advanced prompting."

💬 Conversation-Flow Chart:

Fig.2 - conversation flowchart of BOT interaction

This section describes the process of handling user input and generating structured output.

I] User Input and Prompt:
- When the bot receives user input, it is combined with a specific prompt.
- This prompt instructs the LLM to generate a structured output.
- For example if the user input is : USER: what is the weather in Chennai?
- 📚 Prompt template: OpenChat
```
 GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:
```
- The {prompt} along with the user input will be :
```
GPT4 Correct User: As an Named Entity Recognition and Intent Classification Expert, your task is to analyze questions like the following '###user_input': 
###user_input: what is the weather in Chennai? 
You are required to perform two main tasks based on the '###user_input': 
#Task 1: Classify the '###user_input' into one of the predefined intents ['Weather', 'News']. IMPORTANT: If no clear match to given intents is found, categorize the intent as "Other". 
#Task 2: Extract any geographical or location-related entities present in the '###user_input'. IMPORTANT: If no specific location is mentioned, label the location as "Other". 
###INSTRUCTIONS: Do NOT add any clarifying information. Output MUST follow the schema below. The output should be formatted as a JSON instance that conforms to the JSON schema below. 
As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]} 
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted. 
Here is the output schema: ''' {"properties": {"service_intent": {"title": "Service Intent", "description": "This field stores the intent classified from the user input.", "type": "string"}, "location": {"title": "Location", "description": "This field stores the location value extracted from the user input.", "type": "string"}}, "required": ["service_intent", "location"]} '''
<|end_of_turn|>GPT4 Correct Assistant:
```
  - This prompt is generated by combining 🦜🔗langchain along with some custom instructions at the first place.
  - We can dynamically pass the required services here it is ['Weather', 'News'].
  II] LLM Output Generation: 🤖
  - The combined user input and prompt are passed to the LLM for processing.
  - The LLM attempts to generate structured output, aiming for JSON format.
  III] Output Validation: 👀
  - checking for structured output format:
    - If the output is not in JSON format an iterative process begins.
    - The user input is passed again to the LLM with the same prompt.
    - This loop continues for a pre-defined number of iterations (N).
    - This iterative process will help to filter out some rare glitch in LLM output.
  - checking for missing key values:
    - Following the generation of well-structured JSON data by the LLM, an additional validation step is performed. This step focuses on ensuring that the essential keys, "service" and "location," contain valid non-null values.
    - Service Key: The value of the "service" key is checked for null or emptiness.
    - Location Key: Similarly, the value of the "location" key is checked for null or emptiness.
  - 🧑‍⚕️ Follow-up Questioning:
    
    If either key (service or location) lacks a valid value:
    - ⚙️ Service
      - The bot initiates a follow-up questioning flow specifically designed to elicit the missing service information from the user.
      - This may involve asking the user directly what service they are seeking information about.
    - 📍 Location
      - A similar follow-up questioning flow is initiated if the "location" key lacks a valid value.
      - The bot prompts the user for the desired location information.
  - 🚦 Resuming the Process:
    - Once the user provides the missing information, the original process resumes.
    - The combined user input (including service and location information) is again passed to the LLM with the specific prompt for structured output generation.
    - The validation and follow-up questioning steps repeat as needed until all essential key-value pairs are obtained and a valid structured output is generated.
  - 🔥This iterative approach guarantees that the bot generate a complete and accurate JSON response all the time, even if the user initially forgets to provide all necessary information.
  - 🔥We also passing previous one chat history along with the user input in-order to direct the LLM to generate better quality results.

⚡User interactions: ⚡
- Direct query:-
Fig.3 - user directly asking the complete query in the first try itself

conversation with follow-up questioning when service information is missing:-

Fig.4 - user gives query without specifying the service they need

conversation with follow-up questioning when location information is missing:-

Fig.5 - user gives query without specifying the location details

conversation with follow-up questioning when both location & service informations are missing:-

Fig.6 - user starts the conversation with a greeting only

🌟. 🚀 Hands oN:

CLI

clone repo

git clone https://github.com/Ribin-Baby/RAG-json-responderV1.git

change directory

cd ./RAG-json-responderV1

install requirements

pip install -r requirements.txt

💥 Run

Note

Need GPU with 6GB VRAM and cuda 11.8 installed. Better run on colab with T4 GPU.

python bot.py --s '["News", "Weather"]'

not only ["News", "Weather"] as services we can pass any services we need dynamically to ["Game", "Law"] or ["Economics", "Law", "Weather"] or any.

UI

open the bot_notebook.ipynb file in google colab environment and change the runtime to GPU. And run cell-by-cell.
It may be required to restart the colab after installing the packages. To do that run this cell below.

import  IPython
IPython.Application.instance().kernel.do_shutdown(True)

upon running the last cell, you will get an interactive UI to chat with the BoT.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
images		images
LICENSE		LICENSE
README.md		README.md
bot.py		bot.py
bot_notebook.ipynb		bot_notebook.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Structured Output Generation ChatBot

🌟.❓ Problem Statement:

🌟. 🔍 Model Used:

🏷️ Model Benchmarks:

🌟. 💻 Requirements

🌟. 📖 `Explained`:

🌟. 🚀 Hands oN:

CLI

UI

👍 The END

About

Releases

Packages

Languages

License

Ribin-Baby/RAG-json-responderV1

Folders and files

Latest commit

History

Repository files navigation

RAG Structured Output Generation ChatBot

🌟.❓ Problem Statement:

🌟. 🔍 Model Used:

🏷️ Model Benchmarks:

🌟. 💻 Requirements

🌟. 📖 Explained:

🌟. 🚀 Hands oN:

CLI

UI

👍 The END

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

🌟. 📖 `Explained`:

Packages